# Category : Assembly Language Source Code

Archive : ASMTUT2.ZIP

Filename : CHAP8.DOC

56

CHAPTER 8 - SHIFT AND ROTATE

There are seven instructions that move the individual bits of a

byte or word either left or right. Each instruction works

slightly differently. We'll make a standard program and then

substitute each instruction into that program.

SAL - SHL

The instructions SHL (shift logical left) and SAL (shift

arithmetic left) are exactly the same. They have the same machine

code. They shift each bit to the left. How far? That depends.

There are two (and only two) forms of this instruction. All other

shift and rotate instructions have these two (and only these two)

forms as well. The first form is:

shl al, 1

Which shifts each bit to the left one bit. The number MUST be 1.

No other number is possible. The other form is:

shl al, cl

shifts the bits in AL to the left by the number in CL. If CL = 3,

it shifts left by 3. If CL = 7, it shifts left by 7. The count

register MUST be CL (not CX). The bits on the left are shifted

out of the register into the bit bucket, and zeros are inserted

on the right. The easy way to understand this is to fire up the

standard program. Remember, from now on we always use

template.asm.

;sal.asm

; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE

mov ax_byte, 0A3h ; half reg, low reg binary

mov bx_byte, 0A4h ; half reg, low reg hex

mov cx_byte, 0A1h ; half reg, low reg signed

mov dx_byte, 0A2h ; half reg, low reg unsigned

lea ax, ax_byte

call set_reg_style

mov ax, 0 ; clear registers

mov bx, 0

mov cx, 0

mov dx, 0

mov di, 0

mov bp, 0

call show_regs

outer_loop:

call get_hex_byte ; get number and put in registers

mov bl, al

mov cl, al

______________________

The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson

Chapter 8 - Shift and Rotate 57

____________________________

mov dl, al

mov si, 8 ; 8 iterations of the loop

and al, al ; set the flags

call show_regs_and_wait

shift_loop:

sal al, 1

sal bl, 1

sal cl, 1

sal dl, 1

call show_regs_and_wait

dec si

jnz shift_loop

jmp outer_loop

; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE

This standard program is with bytes, not words. This is because

if we had used words we would have performed 16 individual shifts

and that would have been time consuming and boring. First we set

the style to half registers. Notice that one is binary, one is

hex, one is signed and one is unsigned. That covers all bases.

All the registers are then cleared. It would be nice to use the

loop instruction, but CX is committed, so we make our own loop

instruction. We move 8 into SI. The loop instructions are:

dec si

jnz shift_loop

DEC decrements a register or a variable by 1. Its counterpart INC

increments a register or variable by 1. JNZ (jump if not zero)

jumps to 'shift_loop' if SI is not zero.

We get a hex byte in AL and put the same byte in BL, CL, and DL.

This way we will be able to see what is happening in binary, hex,

signed and unsigned. Before starting, we have:

and al, al

This is there to set the flags correctly before starting. All

four are shifted left one bit each time, and then we look at the

result.

Assemble, link and run it. Enter the number 7. In binary, that is

(0000 0111). Take a look at the flags before starting. It is a

positive number so SF shows '+'. ZF is not set. PF shows 'O'. O

stands for odd. Every time you perform an arithmetic or logical

operation, the 8086 checks parity. Parity is whether the number

contains an even or odd number of 1 bits. This contains 3 1 bits,

so the parity is odd. The possible settings are 'E' for even and

'O' for odd.{1} SAL checks for parity (though some of the other

instructions don't). Now press ENTER. It will shift left 1 and

you will have (0000 1110). What does the unsigned number say now?

14. Press ENTER again. (0001 1100) What does the unsigned number

say? 28. Again (0011 1000) 56. Again (0111 0000) 112. Notice that

____________________

1 This is for use by communications programs.

The PC Assembler Tutor 58

______________________

the signed number reads +112. Look at the CF and OF. They are

both cleared. Things are going to change now. Press ENTER again.

(1110 0000). SF is now '-'. OF, the overflow flag is set because

you changed the number from positive to negative (from +112 to

-32). What is the unsigned number now? 224. CF is cleared. PF is

'0'. Shift again. (1100 0000) OF is cleared because you didn't

change signs. (Remember, the leftmost bit is the sign bit for a

signed number). PF is now 'E' because you have two 1 bits, and

two is even. CF is set because you shifted a 1 bit off the left

end. Keep pressing ENTER and watch SF, OF, CF, and PF.

Let's look at the unsigned numbers we had until we started

shifting 1 bits off the left end. We started with 7, then had 14,

28, 56, 112, 224. This instruction is multiplying by 2. That's

right, and it is MUCH faster than multiplication (about 50 times

faster). Far and away the fastest way to multiply a register by

2, 4 or 8 is to use sal.

; by 2 ;by 4 ; by 8

sal di,1 sal di, 1 sal di, 1

sal di, 1 sal di, 1

sal di, 1

For a register, it is faster to use a series of 1 shifts than to

load cl. For a variable in memory, anything over 1 shift is

faster if you load cl.

Do a few more numbers to see what is happening both with the

number and the flags. CF always signals when a 1 bit has been

shifted off the end.

SAR and SHR

Unlike the left shift instruction, there are two completely

different right shift instructions. SHR (shift logical right)

shifts the bits to the right, setting CF if a 1 bit is pushed off

the right end. It puts 0s in the leftmost bit. Make a copy of

SAL.ASM and replace the four instructions:

sal al, 1

sal bl, 1

sal cl, 1

sal dl, 1

with SHR. We'll call the new program SHR.ASM. Run this one too.

Instead of 7, use E0h (1110 0000) which is 224d. The first time

you shift (0111 0000) the OF flag will be set because the sign

changed. Keep shifting, noting the flags and the unsigned number.

This time we have 224, 112, 56, 28, 14, 7, 3, 1. It is dividing

by two and is once again MUCH faster than division. For a single

shift, the remainder is in CF. For a shift of more than one bit,

you lose the remainder, but there is a way around this which we

will discuss in a moment. Do some more numbers till you are

comfortable with the flags and the operation.

If you want to divide by 16, you will shift right four times, so

Chapter 8 - Shift and Rotate 59

____________________________

you'll lose those 4 bits. But those bits are exactly the value of

the remainder. All we need to do is:

mov dx, ax ; copy of number to dx

and dx, 0000000000001111b ; remainder in dx

mov cl, 4 ; shift right 4 bits

shr ax, cl ; quotient in ax

Using a mask, we keep only the right four bits, which is the

remainder.

SAR

SAR (shift arithmetic right) is different. It shifts right like

SHR, but the leftmost bit always stays the same. This will make

more sense when you run the program. Make another copy, call it

SAR.ASM, and change the four instructions to SAR. The flags

operate the same as for SHR and SHL. The overflow flag will never

change since the left bit will always stay the same.

First enter 74h (+116). We will be looking at the signed numbers

only. Copy down the signed numbers as you go along. They should

be: 116, 58, 29, 14, 7, 3, 1, 0, 0. Now try 8Ch (-116). The

numbers you should get are: -116, -58, -29, -15, -8, -4, -2, -1,

-1. They started out the same, then they got off by one. The

negative numbers are one too negative. Try 39h (+57). The

numbers here are: 57, 28, 14, 7, 3, 1, 0, 0, 0. Just as it should

be for division by 2. Now try C7 (-57). Here the numbers are:

-57, -29, -15, -8, -4, -2, -1, -1, -1. This time it went screwy

right off the bat. Once again, the negative numbers are one too

negative.

SAR is an instruction for doing signed division by 2 (sort of).

It is, however, an incomplete instruction. The rule for SAR is:

SAR gives the correct answer if the number is positive. It gives

the correct answer if the number is negative and the remainder is

zero. If the number is negative but there is a remainder, then

the answer is one too negative. The reason for this is a little

complex, but we need to add some code if we want to do signed

division.{2} For SHR, the remainder part was optional. Here it is

not. We need to know whether the remainder is zero or not. For

this example we will do a word shift left by 6. That's dividing

by 64.

remainder_mask dw 002Fh ; 63

call get_signed ; number in ax

mov bx, ax ; copy in bx

and bx, remainder_mask ; the remainder

mov cl,6 ; shift right 6 bits

sar ax, cl

jns continue ; is it positive?

____________________

2 Both the code and the reasons will be explained (but not

proved) in the summary.

The PC Assembler Tutor 60

______________________

and bx, bx ; is the remainder zero?

jz continue

inc ax

continue:

We get the remainder, then shift right 6 bits. Upon finishing

SAR, the sign flag will be set correctly. Here is yet another

jump. This one is JNS (jump on not sign) jumps if the sign flag

is NOT set, that is if the number is positive. If it is positive,

then everything is ok so we skip ahead. If the number is

negative, then we check to see if there was a remainder. If there

wasn't, everything is ok, so we go ahead. If there was a

remainder, then we INC (add 1) ax.

Is the remainder correct? If the number was positive, the

remainder is correct, but if the number was negative, then we

need to do one more thing. After INC, but before 'continue' we

have a SUB instruction:

inc ax

sub bx, 64 ; correct the remainder

continue:

Why that is the correct number will be explained in the summary.

What a lot of work when we could simply write:

mov cx, 64

call get_signed

cwd ; sign extend

idiv cx ; signed division

Is there any advantage to this instruction? Not really. Remember

that the more you shift, the longer it takes. If you shift 2,

then it's about 1/3 faster than division. If you shift 14, then

it is only 15% faster than division. Considering that even a slow

PC can do 25000 divisions a second, you must be in serious need

of speed to use this. In any case, you will never or almost never

use SAR for signed division, while you will find lots of

opportunity to use SHR and SHL for unsigned multiplication and

division.

ROR and ROL

ROR (rotate right) and ROL (rotate left) rotate the bits around

the register. We will just do one program since they operate the

same way, only in opposite directions. Make another copy of

SAL.ASM and put in ROR in the appropriate spots.

Enter a number. This time you will notice that the bits, rather

than dissapearing off the end, reappear on the other side. They

rotate around the register. The only flags that are defined are

OF and CF. OF is set if the high bit changes, and CF is set if a

1 bit moves off the end of the register to the other side. Do a

few more, and we'll go on to the last two instructions.

Chapter 8 - Shift and Rotate 61

____________________________

RCR and RCL

RCR (rotate through carry right) and RCL (rotate through carry

left) rotate the same as the above instructions except that the

carry flag is involved. Rotating right, the low bit moves to CF,

the carry flag and CF moves to the high bit. Rotating left, the

high bit moves to CF and CF moves to the low bit. There are 9

bits (or 17 bits for a word) involved in the rotation. Make yet

another copy of the program, and change those 4 instructions to

RCR. Also, since we have 9 bits instead of 8, change the loop

count to 9 from 8:

mov si, 9

Enter a number and watch it move. Before you start moving, look

at CF and see if there is anything in it. There are only two

flags defined, OF and CF. Obviously, CF is set if there is

something in it. OF is wierd. In RCL (the opposite instruction to

the one we are using), OF operates normally, signalling a change

in the top (sign) bit. In RCR, OF signals a change in CF. Why? I

don't have the slightest idea. You really have no need for the OF

flag anyways, so this is unimportant.

Well, those are the seven instructions, but what can you do with

them besides multiply and divide?

First, you can work with multiple bit data. The 8087 has a word

length register called the status register. Looking at the upper

byte:

15 14 13 12 11 10 9 8

X X X

bits 11, 12 and 13 contain a number from 0 to 7. The data in this

register is not directly accessable. You need to move the

register into memory, then into an 8086 register. If you want to

find what this number is, what do you do?

mov bx, status_register_data

mov cl, 3

ror bx, cl

and bh, 00000111b

we rotate right 3 and then mask off everything else. The number

is now in BH. We could have used SHR if we wanted. Another 8087

register is the control register. In the upper byte it has:

15 14 13 12 11 10 9 8

X X

a number from 0 to 3 in bits 10 and 11. If we want the

information, we do the same thing:

mov bx, control_register_data

mov cl, 2

ror bx, cl

The PC Assembler Tutor 62

______________________

and bh, 00000011b

and the number is in BH.

You are now going to write a program that inputs an unsigned

number and prints out its hex representation. Here it is:

; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE

mov ax_byte, 0A5h ; half regs, right ascii

mov bx_byte, 4 ; hex

mov dx_byte, 4 ; hex

lea ax, ax_byte

call set_reg_style

call show_regs

outer_loop:

call get_unsigned

mov bx, ax

mov dx, ax

mov cx, 4

inner_loop:

push cx ; save cx

mov cl, 4

rol bx, cl ; rotate left 1/2 byte

mov al, bl ; copy to al

and al, 0Fh ; mask off upper 1/2 byte

cmp al, 10 ; < 10, 0 - 9 ; > 9 A - F

jae use_letters

add al, '0' ; change to ascii

jmp print_it

use_letters:

add al, 'A' - 10 ; 10 = 'A'

print_it:

call print_ascii_byte

call show_regs_and_wait

pop cx

loop inner_loop

jmp outer_loop

; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE

AL will be shown in ascii while BX and DX will be in hex. We save

the original number in DX. Since the first thing we want to print

is the left hex character, we rotate left, not right. We move the

low byte to AL, mask off everything but the low hex number and

then convert to an ascii character. If it is 0 - 9, we add '0'

(the character, not the number). If it is > 9, we add "'A' - 10"

and get a letter (if the number is 10, we get 'A'). JAE means

jump if above or equal, and is an unsigned comparison.{3}

____________________

3 You are getting innundated with conditional jump

instructions. Don't worry. As long as you understand each one

when you run across it, you don't have to remember it. All jump

instructions will be covered soon.

Chapter 8 - Shift and Rotate 63

____________________________

Finally, we print the ascii character that is in AL.{4}

Another thing to notice is that just inside the loop we push CX.

That is because we use CL for the ROL instruction. It is then

POPped just before the loop instruction. This is typical. CX is

the only register that can be used for counting in indexed

instructions. It is common for indexing instructions to be

nested, so you temporarily store the old value of CX while you

are using CX for something different.

push cx ; typical code for a shift

mov cl, 7

shr si, cl

pop cx

Finally, let's multiply large numbers by 2. Here's the code:

; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE

byte1 db ?

byte2 db ?

byte3 db ?

byte4 db ?

error_message db "Result is too large.", 0

; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE

; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE

outer_loop:

lea ax, byte1 ; get 4 byte number

call get_unsigned_4byte

shl byte1, 1

rcl byte2, 1

rcl byte3, 1

rcl byte4, 1

jnc go_on

lea ax, error_message

call print_string

go_on:

lea ax, byte1

call print_unsigned_4byte

jmp outer_loop

; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE

This will require some explaination. Get_unsigned_4byte gets a

number from 1 to four billion. We put it in memory. Normally, the

following instructions would be done word by word. We are doing

them byte by byte so you can see the mechanics of the situation.

The low byte is shifted left 1 bit. This doubles it, but may

shift a 1 bit from the high bit into CF. If it does, then it will

be present when we rotate byte2. That moves CF into the low bit

and moves the high bit into CF. We do it again. And again. If

there is an unsigned overflow, it will be signalled by CF being

____________________

4 Any subroutine in ASMHELP.OBJ that involves a one byte input

or output has the data in AL.

The PC Assembler Tutor 64

______________________

set after:

rcl byte4, 1

JNC (jump on not carry) will skip the error message if everything

is ok. Print_string prints a zero terminated string, that is a C

string which is terminated by the number (not the character) 0.

Finally, we print the number.

A word about large numbers in ASMHELP.OBJ. It is assumed that you

would like to use commas if you could. Any data type over 1 word

long allows commas. The following are considered the same by

ASMHELP.OBJ in its input routines:

23546787

2,3,5,4,6,7,8,7

23,,5,46,,78,7

23,546787

23,546,787

It always prints commas correctly in the print routines.

Chapter 8 - Shift and Rotate 65

____________________________

SUMMARY

All shift and rotate instructions operate on either a register or

on memory. They can be either 1 bit shifts:

sal cx, 1

ror variable1, 1

shr bl, 1

or shifts indexed by CL (it must be CL):

rcl variable2, cl

sar si, cl

rol ah, cl

SHL and SAL

SHL (shift logical left) and SAL (shift arithmetic left) are

exactly the same instruction. They move bits left. 0s are

placed in the low bit. Bits are shoved off the register (or

memory data) on the left side, and CF indicates whether the

last bit shoved was a 1 or a 0. It is used for multiplying

an unsigned number by powers of 2.

SHR

SHR (shift logical right) does the same thing as SHL but in

the opposite direction. Bits are shifted right. 0s are

placed in the high bit. Bits are shoved off the register (or

memory data) on the right side and CF indicates whether the

last bit shoved off was a 0 or a 1. It is used for dividing

an unsigned number by powers of 2.

SAR

SAR (shift arithmetic right) shifts bits right. The high

(sign) bit stays the same throughout the operation. Bits are

shoved off the register (or memory data) on the right side.

CF indicates whether the last bit shoved off was a 1 or a 0.

It is used (with difficulty) for dividing a signed number by

powers of 2.

ROR and ROL

ROR (rotate right) and ROL (rotate left) rotate the bits of

a register (or memory data) right and left respectively. The

bit which is shoved off one end is moved to the other end.

CF indicates whether the last bit moved from one end to the

other was a 1 or a 0.

RCR and RCL

The PC Assembler Tutor 66

______________________

RCR (rotate through carry right) and RCL (rotate through

carry left) rotate the bits of a register (or of memory

data) right and left respectively. The bit which is shoved

off the register (or data) is placed in CF and the old CF is

placed on the other side of the register (or data).

INC

INC increments a register or a variable by 1.

inc ax

inc variable1

DEC

DEC decrements a register or a variable by 1.

dec ax

dec variable1

The following is fairly technical. It is only for those willing

to wade their way through a turgid explaination. If you don't

understand it, forget it.

CODE FOR SHL

If you are shifting an UNSIGNED number right by 'X' bits, it is

the same as dividing by (2 ** X) 1 bit = (2**1 = 2), 2 bits =

(2**2 = 4), 7 bits = (2**7 = 128). This is the same as dividing

by a number which is all 0s except the Xth bit which is 1 (for 0

we have 0000 0001, for 1 we have 0000 0010, for 3 we have 0000

1000, for 7 we have 1000 0000). The remainder mask will be this

number minus 1 (for 0 we have 0000 0000, for 1 we have 0000 0001,

for 3 we have 0000 0111, for 7 we have 0111 1111).

CODE FOR SAR

The order of numbers is important for SAR. If you start with 0

and add 1 each time, the actual sequence of signed numbers that

you get (from the bottom up) is:

-1

-2

.

.

-32767

-32768

+32767

+32766

.

.

3

2

1

0

Chapter 8 - Shift and Rotate 67

____________________________

The positive numbers are increasing in absolute value while the

negative numbers are decreasing in absolute value. If you divide

by shifting and there is no remainder, then the quotient is

exact. If there is a remainder, the quotient will truncate

towards 0 IN THE ABOVE DIAGRAM. This means that positive numbers

will truncate down, while the negative numbers will truncate

towards -32768, and will be one too negative.

If the number was positive, the remainder will be positive and

will be exactly the same as for SHR. If the number was negative,

then things are more complicated. We'll take division by 32 as an

example. If we divide by 32 (0010 0000) the remainder mask will

be 31 (0001 1111). If the number is negative, then what we get

when we AND the mask:

and ax, 00011111b

is not the remainder but (remainder + 32). In order to get the

actual negative remainder, we need to subtract 32. This gives us

(remainder + 32 - 32).

remainder mask = divisor - 1

negative remainder correction = NEG divisor.