Category : Assembly Language Source Code
Archive   : PICALC.ZIP
Filename : PI386.LST

 
Output of file : PI386.LST contained in archive : PICALC.ZIP
Microsoft (R) Macro Assembler Version 6.1a 10/01/94 20:15:00
pi386.ASM Page 1 - 1


; File: PI386.ASM System: PI Calc Version: 1.00 Date: 10-01-94 ;

;-----------------------------------------------------------------------------
; PI386.COM computes and displays number pi using spigot algorithm (due to S.
; Rabinowitz). About 4932*n digits are output, where n=1-1000 is specified
; command-line input. 80386 CPU, 128K RAM, and 66*nK disk space are required.
; Both time and total I/O increase as n*(n+1)/2. Beep flags error, e.g., I/O
; error or pre-386 CPU. The 1000-limit is artificial, but it would take a few
; months to compute five million digits anyway. One million digits (n=203)
; takes about 4.5 days on 486/66MHz machine. Overhead due to file I/O is
; typically one to two percent of total time.
;
; Syntax: PI386 [n | RESUME | DISPLAY] where n=1-1000
;
; E.g., PI386 4 Computes 19,720+ decimal digits of pi
; PI386 R Resumes excution after previous keypress halt
; PI386 D > PI.TXT Displays final/interim output as ASCII digits
;
; Program state is maintained in fixed-length PI386.WRK file. The file
; contains the spigot array of 16*nK dwords (64*nK bytes) plus header info.
; The array is processed in 64K-byte blocks. Block read/writes take place
; whenever displayed WRK counter changes (every 19 seconds on 486/66MHz).
; A total of n*(n+1)/2 of these blocks are processed during run. That is, n
; blocks are processed on first pass through array, n-1 blocks on second pass,
; etc., and one block on last pass. The passes become shorter, since trailing
; insignificant array blocks are ignored when no longer needed.
;
; Output is appended to PI386.OUT file at the end of each pass, when the OUT
; counter changes. Output chunks are 549 dwords, with each dword representing
; 9 digits. The last chunk is shorter. Any normal keypress halts execution,
; but response is delayed until WRK counter changes, in synch with updates to
; WRK file. The DISPLAY option may be used to expand the interim or final
; OUT file to console (or redirected to file) as ASCII digits.
;
; Whenever an n value (new or same) is specified, existing OUT/WRK files are
; overwritten. Use the RESUME option, without n specified, to continue a run
; with existing files. If there is a power loss during a run or a Ctrl-C
; break (instead of delayed break by other keypress), the files will probably
; be ok. Header info is saved to the work file frequently (with each 64K
; block write) so that you can try restart from a power loss with little lost
; computation time. However, if snapshot to disk of computaton state was
; incomplete, try will be unsuccessful. File copies should be saved before
; restarts for this reason. The WRK file may be deleted manually when
; calculation is done--when all display counters are zeroed. A restart when
; done causes no harm. You may retain the compact OUT file results or the
; equivalent ASCII text from the DISPLAY option.
;
; This program is designed for one machine. The algorithm, however, is ideal
; for pipelining with multiple machines that share Q buffer I/O. This would
; allow using the same basic method to compute pi to higher precision than is
; feasable here in reasonable real time on DOS machines.
;
; There are faster algorithms for computing pi, but this may be the simplest,
; since messy fast Fourier transforms are not needed. See PICALC.COM for
; a tiny version of the spigot algorithm (85 bytes), limited to 9860 digits.
; The limit there is due to use of word, rather than dword, array elements.
;
; PI386.COM also differs since the array is handled in 64K blocks (requiring
; file I/O), interim q values are buffered (reducing the file I/O), and the
; effective length of the array is scaled down linearly during a run (roughly
; halving execution time). Output display is a separate command-line option,
; since buffering of q values prevents immediate output anyway and since a
; specific format was desired. The simpler PICALC.COM displays output
; immediately as an unformatted stream of digits. PI386.COM is 1K in size,
; with 720 bytes of code and the rest trailing data/text.
;
; Output was verified to 1.25 million digits against the Gutenberg Project
; PIMIL10.TXT (same size) downloaded from the Internet. The display format
; here matches that file. For reference, pi is:
;
; 3.
; 1415926535 8979323846 2643383279 5028841971 6939937510
; 5820974944 5923078164 0628620899 8628034825 3421170679 <-- 100th digit
; ...
; 5982534904 2875546873 1159562863 8823537875 9375195778
; 1857780532 1712268066 1300192787 6611195909 2164201989 <-- 1000th digit
; ...
; 2645600162 3742880210 9276457931 0657922955 2498872758
; 4610126483 6999892256 9596881592 0560010165 5256375678 <-- 10000th digit
; ...
; 8575016363 4113146275 3049901913 5646823804 3299706957
; 7015078933 7728658035 7127909137 6742080565 5493624646 <-- 100000th digit
; ...
; 0315614033 3212728491 9441843715 0696552087 5424505989
; 5678796130 3311646283 9963464604 2209010610 5779458151 <-- 1000000th digit
; ...
;
; C. Hessel/ER Support/DOE/Germantown MD
;-----------------------------------------------------------------------------
0000 Code_Seg SEGMENT USE16 ; Want DOS-based 16-bit segments
ASSUME cs:Code_Seg,ds:Code_Seg,es:Code_Seg

.486 ; Will only use 386 instructions, but
; want 486 timings in listing
= xchg MUV EQU ; Saves byte on some AX moves
= 000E BLKPWR = 14 ; Determines high buffer size--14 maximum
= 4000 BLKELTS = 1 SHL BLKPWR ; Dword elements per buffer--16K maximum
= 4D104D42 LOGTWO = 4D104D42h ; 65536*65536*log(2) base 10, truncated
= 03E8 NMAX = 1000 ; Cap on input n
= 0009 DGTS = 9 ; Digits per dword--9 maximum
= 3B9ACA00 RADIX = 1000000000 ; Ten raised to power DGTS
= 08900895 FACTOR = LOGTWO/DGTS ; Multiplier used to get output dword count
= 0012 TEMP = 32-BLKPWR ; Construct QELTS value that forces IterCnt
= 0224 TEMP = FACTOR SHR TEMP ; equal to or just under MaxBlk (n)
= 0225 QELTS = TEMP+1 ; Quadwords in Q buffer, dwords in P buffer
= 0010 PARMSIZE = 16 ; Parm space at header start--see EOF
= 1138 HDRSIZE = PARMSIZE+8*QELTS ; Work file header size--includes Q buffer
= 0008 TEMP = HDRSIZE MOD 16 ; Keep work file blocks that follow header
= 0008 TEMP = (16-TEMP) MOD 16 ; 16-aligned for debugging convenience,
= 1140 HDRSIZE = HDRSIZE+TEMP ; but need at least 4-aligned
= 4000 OUTSIZE = 4000h ; Read buffer size for DISPLAY option--
; insure 256-multiple
;-----------------------------------------------------------------------------
; See EOF for data. First order of business is to check command-line and
; handle four possibilities: D, R, no parm, or digits. Other input is
; interpreted by digit handler as zero value for n, which causes abort.
;-----------------------------------------------------------------------------
ORG 100h ; COM file start
0100 2 FC Begin: cld ; Default direction (could assume from DOS)
0101 1 BE 005D mov si,5Dh ; PSP first command arg--uppercase
0104 1 8A 04 mov al,[si] ; Fetch first byte, leaving SI as is
0106 1 3C 44 cmp al,"D" ; for MiscInit
0108 3,1 74 2F je Display ; Display of output file wanted? Ahead

010A 1 3C 52 cmp al,"R"
010C 3,1 74 34 je Resume ; Resume wanted? Ahead

010E 1 3C 20 cmp al," "
0110 3,1 75 42 jne Start ; Not space? Expect command-line n (digits)

0112 1 BA 03E9 R DispSyn: mov dx,OFFSET Syntax ; Else show syntax info and exit via PSP
0115 1 B4 09 Disp: mov ah,9 ; DOS display $-terminated string
0117 30 CD 21 toint: int 21h
0119 5 C3 ret

EVEN ; Force even offset for BP flag in Dump
011A 3 92 OutChr: MUV dx,ax ; Output character AL--called by Dump/Abort
011B 1 B4 02 mov ah,2 ; DOS display character DL
011D 3 EB F8 jmp toint


011F 3 E8 0003 OutCrLf3: call OutCrLf ; Output 3 CrLfs--called only by Dump
0122 3 E8 0000 OutCrLf2: call OutCrLf ; Output 2 CrLfs
0125 1 BA 04FB R OutCrLf: mov dx,OFFSET CrLf ; Output CrLf, positioned here for
0128 3 EB EB jmp Disp ; short jump

012A 3 E8 0213 Prep: call Check ; RAM/386 checks, returning AX/CX/EBX/BP
012D 1 2D 1000 sub ax,1000h ; Back up 64K, still above stack
0130 3p 8E C0 mov es,ax ; Save as buffer segment for 64K spigot
0132 3 E8 FFDD call DispSyn ; Display syntax, setting DX to offset
0135 1 B2 D0 mov dl,LOW (OFFSET MsgHdr-OFFSET Begin)
0137 3 EB DC jmp Disp ; Need MsgHdr and Syntax in same page

0139 3 E8 0204 Display: call Check ; RAM/386 checks, returning AX/CX/EBX/BP
013C 3 E8 0287 call OpenOne ; Open output file, setting handle BX
013F 3 E9 0199 jmp Dump ; Dump digits to console, with BX handle
; and CX/high EBX zero and BP odd
0142 3 E8 FFE5 Resume: call Prep ; See above
0145 3 E8 0276 call OpenTwo ; Open work and output files read/write
0148 1 B8 4202 mov ax,4202h ; DOS seek from EOF
014B 3 99 cwd ; Zero DX (CX zero from Prep)
014C 3 E8 01DD call Int21h ; Seek to EOF (handle BX from OpenTwo)
014F 1 BD 3F00 mov bp,3F00h ; DOS read on reentry
0152 3 EB 72 jmp Reenter ; Near bottom of main loop below--this is
; leap of faith in integrity of files
0154 3 E8 FFD3 Start: call Prep ; See above
0157 3 E8 021E call MiscInit ; Miscellaneous, CX/EBX/SI input, SI output
015A 1 BD 4000 mov bp,4000h ; DOS write for next and ApndBlk
015D 3 E8 01B0 call SvLdHdr ; Write work file header, setting handle BX
0160 3 E8 00F7 initloop: call ApndBlk ; Append SI initial blocks of dword 2's to
0163 1 4E dec si ; work file--last dword in last block
0164 3,1 75 FA jne initloop ; could be set to 4 (see MiscInit)
;-----------------------------------------------------------------------------
; Main loop, ending with scaled reduction in MaxBlk to cut run time roughly in
; half. QELTS was set carefully so that reduction is just decrement. Initial
; MaxBlk and IterCnt are same for n under 554. For larger n (or for smaller
; BLKPWR or if DGTS/RADIX reduced), IterCnt is less than MaxBlk. In such a
; case, last iteration operates on more than one array block.
;-----------------------------------------------------------------------------
0166 1 A1 0470 R iterloop: mov ax,MaxBlk ; Start from (reduced) MaxBlk
0169 1 A3 0476 R blkloop: mov CurBlk,ax ; Save current block number
016C 1 50 push ax ; To stack for later
016D 1 48 dec ax ; Make block number zero-based for next
016E 1 BD 3F00 mov bp,3F00h ; DOS read
0171 3 E8 00E0 call SvLdBlk ; Load CurBlk from disk
0174 1 BF 0480 mov di,OFFSET QBuf ; Reset pointers into Q buffer
0177 1 BE 15B0 mov si,OFFSET PBuf ; and P buffer
017A 1 56 push si ; Save as possible write offset after loop
017B 1 A1 0472 R mov ax,IterCnt
017E 1 48 dec ax ; Test if in last outer iteration
017F 1 B8 0225 mov ax,QELTS ; In case not, set usual Q buffer count
0182 3,1 75 03 jne qloop ; Not last? Ahead

0184 1 A1 0474 R mov ax,QLast ; Else reduce Q (and P) buffer count
0187 1 50 qloop: push ax ; Save Q buffer count to stack
0188 1 48 dec ax ; Make zero-based for display
0189 1 A8 0F test al,0Fh ; Display every 16 qloop iterations
018B 3,1 75 06 jne skipdisp ; Not 16-multiple? Skip

018D 3 E8 015A call PrepStat ; Prepare message--AX input/DX output
0190 3 E8 FF82 call Disp ; Display status--DX input
0193 1 66| 8B 1D skipdisp: mov ebx,[di] ; Get next ECX:EBX from Q buffer without
0196 1 66| 8B 4D 04 mov ecx,[di+4] ; advancing pointer
019A 3 E8 0043 call BufPass ; Returns ECX:EBX and EDX
019D 1 66| 89 1D mov [di],ebx ; Save ECX:EBX to Q buffer (was zeroed for
01A0 1 66| 89 4D 04 mov [di+4],ecx ; next outer loop if this is digit pass)
01A4 1 83 C7 08 add di,8
01A7 1 66| 89 14 mov [si],edx ; Save possible digits EDX to P Buffer
01AA 1 83 C6 04 add si,4 ; (later ignored if not digit pass)
01AD 1 58 pop ax ; Restore Q buffer count
01AE 1 48 dec ax ; and decrement
01AF 3,1 75 D6 jne qloop ; Not done? Loop

01B1 1 5A pop dx ; Restore P buffer offset (used if digits)
01B2 1 5F pop di ; Restore block number
01B3 1 BD 4000 mov bp,4000h ; DOS write for File21h/SvLdBlk/SvLdHdr
01B6 1 4F dec di ; Decrement, testing if lowest block
01B7 3,1 75 09 jne nodigits ; No? Then no digits this pass else save
; P buffer data from offset DX
01B9 1 8B CE mov cx,si ; Offset after last EDX store above
01BB 1 2B CA sub cx,dx ; Bytes of P buffer data (usually 4*QELTS)
01BD = 01BE OutHandl = $+1 ; EBX is zero from BufPass, so only a low
01BD 1 B3 00 mov bl,0 ; byte load is needed to fetch handle
01BF 3 E8 015A call File21h ; Append P buffer data to output file
01C2 3 97 nodigits: MUV ax,di ; Decremented block number to AX
01C3 3 E8 008E call SvLdBlk ; Save block back to work file
01C6 3 E8 0147 Reenter: call SvLdHdr ; Save parms and Q buffer to work file
; (or load on reentry with 3Fh)
01C9 1 B4 01 mov ah,1 ; BIOS check for keypress made in synch
01CB 30 CD 16 int 16h ; with file updates, so response delayed
01CD 3,1 75 10 jne done ; Yes? Early exit

01CF 1 A1 0476 R mov ax,CurBlk ; Current block number--because of loop
01D2 1 48 dec ax ; reentry, this cannot be on stack
01D3 3,1 75 94 jne blkloop ; Not lowest block? Loop

01D5 3 FF 0E 0470 R dec MaxBlk ; Reduce maximum work file block referenced
01D9 3 FF 0E 0472 R dec IterCnt ; Parallel reduction in iteration count
01DD 3,1 75 87 jne iterloop ; Not last iteration? Loop

01DF 5 C3 done: ret ; Exit via int 20h in PSP, closing files
;-----------------------------------------------------------------------------
; Innermost buffer pass, using only registers in loop, except for load/store.
; Input is high buffer contents and q value ECX:EBX. Buffer contents and q
; value are updated on output, and, if this is digit pass (CurBlk=1), output
; dword is returned in EDX. On digit pass, output q value ECX:EBX is zeroed
; for start of next outer iteration.
;
; Unrolling loop to reduce jumps would improve performance slightly. Over 98%
; of program time is spent here on typical machine, with file I/O and status
; display accounting for nearly all the rest.
;-----------------------------------------------------------------------------
01E0 1 57 BufPass: push di
01E1 1 56 push si
01E2 1 66| BE 3B9ACA00 mov esi,RADIX ; Fixed radix for multiply/divide
01E8 1 BF FFFC mov di,4*BLKELTS-4 ; Point to top dword of buffer
01EB 1 66| 8B 2E 0476 R mov ebp,CurBlkD ; Fetch CurBlk (high word zero)
01F0 2 66| C1 E5 0F shl ebp,BLKPWR+1 ; Multiply by 2*BLKELTS to get 2*k+2
01F4 3 06 push es
01F5 3p 1F pop ds ; Synch DS with ES to eliminate overrides
EVEN
01F6 1 66| 4D bufloop: dec ebp ; 2*k+1
01F8 1 66| 8B 05 mov eax,[di] ; Fetch next a[k]
01FB 13+ 66| F7 E6 mul esi ; a[k]*radix
01FE 1 66| 03 C3 add eax,ebx ; a[k]*radix + q
0201 1 66| 13 D1 adc edx,ecx
0204 40 66| F7 F5 div ebp ; q <- a[k]/(2*k+1)
0207 1 66| 89 15 mov [di],edx ; Store remainder to a[k]
020A 3 66| D1 ED shr ebp,1 ; k
020D 13+ 66| F7 E5 mul ebp ; k*q
0210 1 66| 03 ED add ebp,ebp ; 2*k
0213 1 66| 8B D8 mov ebx,eax ; q <- k*q
0216 1 66| 8B CA mov ecx,edx
0219 1 83 EF 04 sub di,4 ; Point to next a[k]
021C 3,1 75 D8 jne bufloop ; Not last in block? Loop

021E 1 66| 4D dec ebp ; 2*k+1
0220 1 66| 8B 05 mov eax,[di] ; Fetch last a[k]
0223 13+ 66| F7 E6 mul esi ; a[k]*radix
0226 1 66| 03 C3 add eax,ebx ; a[k]*radix + q
0229 1 66| 13 D1 adc edx,ecx
022C 1 66| 4D dec ebp ; 2*k
022E 3,1 74 1A je gotdgts ; k = 0? Then can spit out 9 digits,
; else handle as if still in loop
0230 1 66| 45 inc ebp ; 2*k+1
0232 40 66| F7 F5 div ebp ; q <- a[k]/(2*k+1)
0235 1 66| 8B DA mov ebx,edx ; a[k] <- remainder
0238 3 66| D1 ED shr ebp,1 ; k
023B 13+ 66| F7 E5 mul ebp ; k*q
023E 3 66| 93 xchg ebx,eax ; q <- k*q
0240 1 66| 8B CA mov ecx,edx
0243 5 66| AB stoback: stosd ; Store a[k] (or a[0])
0245 1 5E pop si
0246 1 5F pop di
0247 3 0E setds: push cs ; Restore DS
0248 3p 1F pop ds
0249 5 C3 ret ; Return ECX:EBX (and EDX if digits)

024A 40 66| F7 F6 gotdgts: div esi ; Digits <- a[0]/radix
024D 1 66| 33 DB xor ebx,ebx ; Zero q for next blkloop (ECX zero)
0250 3 66| 92 xchg eax,edx ; Ready remainder a[0] for store
0252 3 EB EF jmp stoback ; Store a[0], return ECX:EBX and EDX
;-----------------------------------------------------------------------------
; Save/load/append current block to/from work file. Input BP high is 3Fh/40h
; to flag read/write and AX is zero-based block number. For append entry
; point, BX must be work file handle (AX irrelevant--no seek done).
;-----------------------------------------------------------------------------
0254 SvLdBlk:
IF BLKELTS EQ 4000h ; Maximum value allowed
0254 1 BA 1140 mov dx,HDRSIZE ; Low seek offset, AX is high offset
ELSE ; Small test values
ENDIF
0257 3 E8 00DD call WrkSeek ; Seek from BOF--also sets handle BX
025A 3 06 ApndBlk: push es ; Need high BP 40h and BX handle for append
025B 3p 1F pop ds ; Set DS to high buffer segment
025C 1 33 D2 xor dx,dx ; Start offset of high buffer
IF BLKELTS EQ 4000h ; Maximum value allowed
025E 1 B9 8000 mov cx,2*BLKELTS
0261 3 E8 00B8 call File21h ; Half block
0264 3 92 MUV dx,ax ; Bytes read or written to DX--next offset
ELSE ; Small test values
ENDIF
0265 3 E8 00B4 call File21h ; Half block (small full block if testing)
0268 3 EB DD jmp setds ; Restore DS and exit
;-----------------------------------------------------------------------------
; Display OUT file. Format matches Gutenberg Project pi file, with output in
; 1000-digit blocks, 20 lines each, and with 3 blank lines separating blocks
; (so that Vern Buerg's LIST utility can page through output without drift).
; Input is handle BX, zero CX/high EBX and BP odd (OpenFile offset).
;
; Note that output dword may on rare occasions exactly equal Radix. This
; should be handled by expanding the dword to DGTS zeroes and bumping previous
; dword by one (carry). Instead, a colon (ASCII 58) followed by DGTS-1 zeroes
; is output. The colon should be manually replaced with 0 and a carry of 1
; added to the digits left of the colon.
;
; This event is easily noticeable for DGTS < 4, but becomes statistically
; rarer as DGTS increases. For maximum DGTS = 9, the statistical probablilty
; of such an occurrence in first 5 million digits is under 0.05% and it was
; not, in fact, observed to occur in first 1.25 million digits. The low
; likelihood is "justification" for sloppy handling here (which adds one-byte
; push ax instruction to routine below). See SPIGOT.PRG for small radix
; demo of these overflow events.
;-----------------------------------------------------------------------------
026A 1 53 dumploop: push bx ; Save so no OutHandl reference needed
026B 1 BF 4500 mov di,OFFSET DgtBuf ; Expand dwords to digits
026E 1 57 push di ; Save for digit source later
026F 1 8B F2 mov si,dx ; OutBuf from file read
0271 1 B3 0A mov bl,10 ; Fixed divisor--BH zero from handle and
; high EBX zeroed earlier
0273 3 91 MUV cx,ax ; Number of source dwords
0274 3 D1 ED shr bp,1 ; BP odd before Dump call
0276 1 BD 011A R mov bp,OFFSET OutChr ; Function pointer (even) for duration
0279 3,1 73 0E jnc exploop ; Not first pass? Ahead

027B 1 49 dec cx ; Should still be non-zero
027C 5 66| AD lodsd ; First dword should be binary 3
027E 1 04 30 add al,"0"
0280 5p FF D5 call bp ; Output expected 3 then dot
0282 1 B0 2E mov al,"."
0284 5p FF D5 call bp
0286 3 E8 FE99 call OutCrLf2 ; Pair of CrLfs
0289 1 51 exploop: push cx
028A 5 66| AD lodsd ; Load dword from buffer then extract 9
028C 1 B9 0008 mov cx,DGTS-1 ; decimal digits--last one specially
IF DGTS GT 1
028F 3 66| 99 pushlp: cdq ; Zeroes EDX since EAX less equal Radix,
0291 40 66| F7 F3 div ebx ; but divide overflow possible if bad file
0294 1 52 push dx ; Save remainder, since must reverse order
0295 7,6 E2 F8 loop pushlp
ENDIF
0297 1 50 push ax ; Final quotient--can be 10 in rare cases
0298 1 B1 09 mov cl,DGTS ; CH zero
029A 1 58 poplp: pop ax ; Pop digits in reverse order
029B 1 04 30 add al,"0" ; Make ASCII digit (or rare colon)
029D 5 AA stosb ; Store to digit buffer
029E 7,6 E2 FA loop poplp

02A0 1 59 pop cx ; Dwords left
02A1 7,6 E2 E6 loop exploop ; Not done expanding? Loop

02A3 1 5E pop si ; DgtBuf start for output pass
02A4 1 5B pop bx ; Restore handle
02A5 1 8B CF mov cx,di
02A7 1 2B CE sub cx,si ; Total digits to handle this pass
02A9 5 AC chrloop: lodsb ; Fetch next digit--CX ok through loop
02AA 5p FF D5 call bp ; Display
02AC 3 FE 0E 04FA R dec Cnt10
02B0 3,1 75 27 jne nextchr ; Not end of 10-digit block? Ahead

02B2 1 C6 06 04FA R 0A mov Cnt10,10 ; Reset 10-digit counter
02B7 3 FE 0E 04FF R dec Cnt5
02BB 3,1 74 06 je chkblk ; Last digit in line? Ahead--no space

02BD 1 B0 20 mov al," "
02BF 5p FF D5 call bp ; Space to separate 10-digit blocks
02C1 3 EB 16 jmp nextchr ; To next digit

02C3 1 C6 06 04FF R 05 chkblk: mov Cnt5,5 ; Reset 5-block counter
02C8 3 E8 FE5A call OutCrLf ; Output end-of-line CrLf
02CB 3 FE 0E 04FE R dec Cnt20
02CF 3,1 75 08 jne nextchr ; Not last line in 20-line group? Ahead

02D1 1 C6 06 04FE R 14 mov Cnt20,20 ; Reset 20-line counter
02D6 3 E8 FE46 call OutCrLf3 ; Output 3 more CrLfs
02D9 7,6 E2 CE nextchr: loop chrloop ; Not end of buffer? Loop

02DB 1 BA 0500 Dump: mov dx,OFFSET OutBuf ; Handle EBX, zero CX, and odd BP input
02DE 1 B5 40 mov ch,OUTSIZE/256 ; Full buffer (fewer if near EOF)
02E0 1 B4 3F mov ah,3Fh ; DOS read--use Int21h to allow short read
02E2 3 E8 0047 call Int21h ; Returns bytes-read AX, if no abort
02E5 2 C1 E8 02 shr ax,2 ; Bytes-read to dword count
02E8 3,1 75 80 jne dumploop ; Some dwords? Loop, else fall harmlessly
; to return and PSP exit...
;-----------------------------------------------------------------------------
; Store three nested loop counters to message buffer. Each is decremented to
; make zero-based. AX input (decremented counter). DX message offset output.
;
; On fall through from Dump, uninitialized CurBlk/IterCnt have clear high bits
; from ASCII syntax text, so cwd instruction below is ok.

;-----------------------------------------------------------------------------
02EA 1 BB 03E7 R PrepStat: mov bx,OFFSET Message+11
02ED 3 E8 000A call StoDgts+1 ; Store Q counter (already decremented)
02F0 1 A1 0476 R mov ax,CurBlk
02F3 3 E8 0003 call StoDgts ; Decrement and store blkloop counter
02F6 1 A1 0472 R mov ax,IterCnt
02F9 1 48 StoDgts: dec ax ; Make display counter zero-based
02FA 1 BD 000A mov bp,10 ; Divisor
02FD 1 B9 0003 mov cx,3
0300 3 99 sdlp: cwd ; Zeroes DX, since counters small
0301 24 F7 F5 div bp
0303 3 92 xchg ax,dx ; Remainder to AL, save quotient
0304 1 04 30 add al,"0"
0306 1 88 07 mov [bx],al ; Store ASCII digit
0308 1 4B dec bx ; Move leftward
0309 3 92 MUV ax,dx ; Quotient back to AX
030A 7,6 E2 F4 loop sdlp

030C 1 8B D3 mov dx,bx ; On last call, sets DX to message start
030E 1 4B dec bx ; Skip colon on first two calls
030F 5 C3 toret: ret
;-----------------------------------------------------------------------------
; Save/load parm bytes and Q buffer to/from workfile. BP high is 3Fh/40h.
;-----------------------------------------------------------------------------
0310 1 33 C0 SvLdHdr: xor ax,ax
0312 3 99 cwd ; Zero DX
0313 3 E8 0021 call WrkSeek ; Seek from BOF--also sets BX
0316 1 BA 0470 mov dx,OFFSET HdrBuf
0319 1 B9 1140 mov cx,HDRSIZE ; Fall...
;-----------------------------------------------------------------------------
; Int 21h with error check for read/write calls. BP high byte is 3Fh/40h.
;-----------------------------------------------------------------------------
EVEN ; Insure OpenFile offset odd for Dump
031C 1 8B C5 File21h: mov ax,bp ; Set AH, preserving BP
031E 3 E8 000B call Int21h ; Read/write, aborting on set carry
0321 1 3B C1 cmp ax,cx ; Also test for short read/write
0323 3 EB 09 jmp jcabort ; Short? Abort else return
;-----------------------------------------------------------------------------
; Open file, name at DX, returning handle AX.
;-----------------------------------------------------------------------------
0325 1 B8 3D02 OpenFile: mov ax,3D02h ; DOS read/write open
0328 3 EB 02 jmp Int21h
;-----------------------------------------------------------------------------
; Create file, name at DX, returning handle AX. Need input CX zero.
;-----------------------------------------------------------------------------
032A 1 B4 3C MakeFile: mov ah,3Ch ; DOS create, assuming CX zero
032C 30 CD 21 Int21h: int 21h
032E 3,1 73 DF jcabort: jnc toret ; Ok? Out, else fall...
;-----------------------------------------------------------------------------
; Error exit. Just beep and quit.
;-----------------------------------------------------------------------------
0330 1 B0 07 Abort: mov al,7 ; Bell character
0332 3 E8 FDE5 call OutChr ; Output character to console
0335 30 CD 20 int 20h ; Closes files too
0337 ;-----------------------------------------------------------------------------
; Seek AX:DX bytes from BOF in work file. BX is handle on exit.
;-----------------------------------------------------------------------------
= 0338 WrkHandl = $+1 ; Handle is stored in instruction
0337 1 BB 0000 WrkSeek: mov bx,0 ; Handle to BX
033A 3 91 MUV cx,ax ; High seek offset to CX, DX ok
033B 1 B8 4200 mov ax,4200h ; DOS seek from BOF
033E 3 EB EC jmp Int21h ; Int 21h with set carry abort
;-----------------------------------------------------------------------------
; Check 386 and RAM, setting AX to segment 128K forward, possibly aborting.
; Routine also zeroes EBX/CX and sets pointer BP for Display/Resume for use
; later. The 386 check follows Intel-recommended method (see CPUID.ZIP) and,
; as side effect, virtually insures DOS 2.0+ for file I/O. I doubt if DOS
; 1.x is running on any 386 machines.
;-----------------------------------------------------------------------------
0340 1 BA F000 Check: mov dx,0F000h ; Constant for use a few times below
0343 4p 9C pushf ; Try to clear bits 12-15 in CPU flags,
0344 1 5B pop bx ; which are always set on 8086/8088
0345 1 B8 0FFF mov ax,0FFFh
0348 1 23 C3 and ax,bx ; Clear bits 12-15
034A 1 50 push ax
034B 9p 9D popf ; Back to flags
034C 4p 9C pushf
034D 1 58 pop ax ; Back to AX
034E 1 23 C2 and ax,dx ; If bits 12-15 are set, then
0350 1 3B C2 cmp ax,dx ; CPU is an 8086/8088
0352 3,1 74 DC je Abort ; Yes? Abort

0354 1 0B DA or bx,dx ; Try to set bits 12-15 in CPU flags,
0356 1 53 push bx ; which are always clear on 80286
0357 9p 9D popf
0358 4p 9C pushf
0359 1 58 pop ax ; If bits 12-15 are cleared, then
035A 1 23 C2 and ax,dx ; CPU is an 80286
035C 3,1 74 D2 je Abort ; Yes? Abort

035E 1 BD 0325 R mov bp,OFFSET OpenFile ; Function pointer for Display/Resume
0361 1 33 C9 xor cx,cx ; Used later at various points
0363 1 66| 33 DB xor ebx,ebx ; Used later--386 instructions ok now
0366 3 8C C8 mov ax,cs ; Point to segment 128K forward past 64K
0368 1 05 2000 add ax,2000h ; spigot buffer and compare to segment
036B 2 39 47 02 cmp [bx+2],ax ; after COM allocation (PSP)
036E 3 EB BE jmp jcabort ; No room? Abort, else return
;-----------------------------------------------------------------------------
; From CX/EBX zero and SI pointing to left digit, get command-line n. Abort
; if not 1-1000. Limit is artificially set for 3-digit display (but higher
; values ok). From n, set parameters MaxBlk/IterCnt/Qlast and zero CurBlkD.
; Zero Q buffer, which holds quadwords for intermediate q values. Store dword
; 2's to entire 64K high segment. Finally, create files. SI holds n on exit.
;
; Total dwords output is set to n*16384*log(2)/9 + 1, truncated to an integer.
; Total decimal digit dwords output (excluding ordinal 3 dword) is one less,
; or roughly 4932*n decimal digits. This is closest to accuracy limit of
; about n*16384*log(2) + 0.5*log(n) + 1.5576 decimal places when truncated
; fraction is small. Since less than full accuracy is output, the result will
; be, at worst, off by one unit in the last decimal place. The accuracy slack
; makes even this error unlikely. See SPIGOT.PRG for FoxPro demo of algorithm
; error bound.

; The fudging below in FACTOR over allowed n range adds at worst 0.3 decimal
; digits to the dword output setting. This is well under the slack term
; 0.5*log(n) + 1.5576. The fudge works since the high and low words of
; LOGTWO are approximately equal. Part of the fun in writing this program
; was minimizing code size without hurting performance or functionality.
;
; Making last array dword 4 initially (or, equivalently, setting first Q
; buffer quadword below to 2*RADIX) would improve the limit slightly, but
; would not affect the first order term used in setting dword total.
;-----------------------------------------------------------------------------
= 089D TEMP = HDRSIZE/2-3 ; CX value used below to clear Q buffer--
= 0895 LOFAC = FACTOR MOD 65536 ; very close to LOFAC at maximum BLKPWR
= 0890 HIFAC = FACTOR/65536 ; for all DGTS but 5 and 7
= 0008 TINY = FACTOR SHR 24 ; Limit adjustment to 1 part in 16 million

= FFFFFFFF FUDGING = TEMP-LOFAC LE TINY AND LOFAC-TEMP LE TINY
IF FUDGING ; Adjust FACTOR for dual CX/ECX setting
= 0890089D FACTOR = HIFAC*65536+TEMP
ENDIF
0370 3 98 getloop: cbw ; Zero AH
0371 3 91 xchg ax,cx ; Accumulation to AX, binary digit to CX
0372 1 B3 0A mov bl,10 ; EBX zeroed earlier
0374 13+ F7 E3 mul bx ; Accumulation times 10
0376 1 03 C8 add cx,ax ; Sum to CX for new accumulation
0378 5 AC MiscInit: lodsb ; Get next digit
0379 1 2C 30 sub al,"0" ; ASCII to binary
037B 3,1 72 04 jb gotnum ; Exit at first non-digit

037D 1 3C 09 cmp al,9
037F 3,1 76 EF jbe getloop ; Loop if binary 0-9

0381 3 91 gotnum: MUV ax,cx ; Set AX to n
0382 1 48 dec ax
0383 1 3D 03E8 cmp ax,NMAX ; Insure n is 1-1000
0386 3,1 73 A8 jae Abort ; No? Abort (also occurs if no digits)

0388 1 40 inc ax ; Back to n
0389 1 8B F0 mov si,ax ; Save as exit value for initial write loop
038B 3 06 push es ; Synch ES/DS for stosw's that follow
038C 3 1E push ds
038D 3p 07 pop es
038E 1 BF 0470 R mov di,OFFSET MaxBlk ; Point to first header parm
0391 5 AB stosw ; MaxBlk--n initially
0392 3 66| 98 cwde ; Zero high EAX word (n small)
0394 2 66| C1 E0 0E shl eax,BLKPWR ; BLKELTS times n yields total array dwords
0398 1 66| B9 0890089D mov ecx,FACTOR ; LOGTWO/DGTS, possibly fudged small amount
039E 13+ 66| F7 E1 mul ecx ; Sets EDX to number of decimal dwords to
03A1 1 66| 52 push edx ; output, bumped below to account for
03A3 1 58 pop ax ; initial ordinal 3 dword
03A4 1 5A pop dx ; DX:AX (plus one) now dwords to output
03A5 1 BB 0225 mov bx,QELTS ; Dwords output per iteration
03A8 24 F7 F3 div bx ; Divide to get iteration count
03AA 1 40 inc ax ; Bump for extra remainder iteration
03AB 5 AB stosw ; IterCnt--MaxBlk or just under by design
03AC 3 92 MUV ax,dx ; Remainder to AX and bump once off possible
03AD 1 40 inc ax ; zero (and accounting for ordinal 3)
03AE 5 AB stosw ; Qlast--dwords output on last interation
IF NOT FUDGING ; Else CX already HDRSIZE/2-3 from FACTOR
ENDIF
03AF 1 66| 33 C0 xor eax,eax ; Zero rest of parm space and Q buffer,
03B2 5n F3/ AB rep stosw ; leaving DI at dword offset afterward
03B4 3p 07 pop es ; Restore high buffer segment
03B5 1 49 dec cx ; CX was zero--repetition overkill, but ok
03B6 1 B0 02 mov al,2 ; Store dword 2's to high buffer, starting
03B8 5n F3/ 66| AB rep stosd ; somewhere in middle--CX zero afterward
03BB 1 BD 032A R mov bp,OFFSET MakeFile ; Fall, creating files...
;-----------------------------------------------------------------------------
; Open work file and output file, setting handles. Input BP must point to
; either MakeFile or OpenFile above. Second entry point used to open just
; output file for DISPLAY option. Output handle returned in BX.
;
; Handles are stored in instruction operands. This is ok (for current Intel
; CPUs), since any jump/call/ret flushes prefetch queue. If this is not
; to your taste, you may store the handles, e.g., at the end of the PSP and
; adjust the two target instructions from immediate loads to memory loads.
; This will add a few bytes to COM file size.
;-----------------------------------------------------------------------------
03BE 1 BA 044E R OpenTwo: mov dx,OFFSET WrkFile
03C1 5p FF D5 call bp
03C3 1 A2 0338 R mov BYTE PTR WrkHandl,al
03C6 1 BA 045C R OpenOne: mov dx,OFFSET OutFile
03C9 5p FF D5 call bp
03CB 1 A2 01BE R mov BYTE PTR OutHandl,al

03CE 3 93 MUV bx,ax ; Output handle to BX
03CF 5 C3 ret
;-----------------------------------------------------------------------------
; Local data and small buffers next. Main 64K buffer has own segment.
;-----------------------------------------------------------------------------
03D0 4F 55 54 20 57 52 4B 20 MsgHdr DB "OUT WRK RAM",10 ; Header for display counters
52 41 4D 0A
03DC 0D 20 30 31 2D 4F 63 74 Message DB 13," 01-Oct-94 $" ; Status buffer with program date initially
2D 39 34 20 24
03E9 0D 0A 53 79 6E 74 61 78 Syntax DB 13,10,"Syntax: PI386 [n | RESUME | DISPLAY] where n=1-1000"
3A 20 50 49 33 38
36 20 5B 6E 20 7C
20 52 45 53 55 4D
45 20 7C 20 44 49
53 50 4C 41 59 5D
20 77 68 65 72 65
20 6E 3D 31 2D 31
30 30 30
041E 0D 0A 0A 46 69 6E 64 73 DB 13,10,10,"Finds 4932*n digits of pi by spigot method. "
20 34 39 33 32 2A
6E 20 64 69 67 69
74 73 20 6F 66 20
70 69 20 62 79 20
73 70 69 67 6F 74
20 6D 65 74 68 6F
64 2E 20 20
044E 50 49 33 38 36 2E 57 52 WrkFile DB "PI386.WRK",0,"and "
4B 00 61 6E 64 20
045C 50 49 33 38 36 2E 4F 55 OutFile DB "PI386.OUT",0
54 00
0466 63 72 65 61 74 65 64 2E Free DB "created.",13,10,"Need 66*nK disk/128K RAM/386 CPU. "
0D 0A 4E 65 65 64
20 36 36 2A 6E 4B
20 64 69 73 6B 2F
31 32 38 4B 20 52
41 4D 2F 33 38 36
20 43 50 55 2E 20
20
0493 54 69 6D 65 20 67 72 6F DB "Time grows as n*(n+1)/2. Keypress halts",13,10,"run for "
77 73 20 61 73 20
6E 2A 28 6E 2B 31
29 2F 32 2E 20 20
4B 65 79 70 72 65
73 73 20 68 61 6C
74 73 0D 0A 72 75
6E 20 66 6F 72 20
04C5 6C 61 74 65 72 20 72 65 DB "later restart. Details in ASM file--Craig R. Hessel."
73 74 61 72 74 2E
20 20 44 65 74 61
69 6C 73 20 69 6E
20 41 53 4D 20 66
69 6C 65 2D 2D 43
72 61 69 67 20 52
2E 20 48 65 73 73
65 6C 2E
04FA 0A Cnt10 DB 10 ; Digit count within block--also LF
04FB 0D 0A 24 CrLf DB 13,10,"$" ; CrLf and counts used in DISPLAY option
04FE 14 Cnt20 DB 20 ; Line count for 1000 digits
04FF 05 Cnt5 DB 5 ; Block count within line
0500
= 0500 LOC = OFFSET $-OFFSET Begin+100h
= 0000 TEMP = LOC MOD 16
= 0000 TEMP = (16-TEMP) MOD 16 ; Paragraph align OutBuf after Syntax

= 0500 OutBuf = LOC+TEMP ; Read buffer for DISPLAY option only
= 4500 DgtBuf = OutBuf+OUTSIZE ; Digit buffer for DISPLAY option only
;-----------------------------------------------------------------------------
; Following parm data/buffers overwrite tail of syntax message above. These
; parms, along with Q buffer array and spigot array, preserve program state
; between restarts. All are saved in work file. Insure PARMSIZE equate at
; BOF is at least as large as parms here (10 bytes). Order of parms is
; specific for initialization in MiscInit. Without overwrite of syntax, ORG
; statements below would extend COM file size.
;-----------------------------------------------------------------------------
= 0466 LOC = OFFSET Free-OFFSET Begin+100h
= 0006 TEMP = LOC MOD 16
= 000A TEMP = (16-TEMP) MOD 16 ; Paragraph align HdrBuf inside Syntax

= 0470 HdrBuf = LOC+TEMP ; Up to PARMSIZE bytes of header data
= 0480 QBuf = HdrBuf+PARMSIZE ; Q values, I/O for each main buffer pass
= 15B0 PBuf = HdrBuf+HDRSIZE ; Pi digits, output as dwords

ORG HdrBuf
0470 MaxBlk LABEL WORD ; Maximum block referenced--decreases
ORG HdrBuf+2
0472 IterCnt LABEL WORD ; Iteration count decremented to zero
ORG HdrBuf+4
0474 QLast LABEL WORD ; Q (and P) buffer elements in last
ORG HdrBuf+6 ; iteration (less equal QELTS)
0476 CurBlk LABEL WORD ; Current file block
0476 CurBlkD LABEL DWORD ; Referenced once as dword (high word zero)

0476 Code_Seg ENDS
END Begin


  3 Responses to “Category : Assembly Language Source Code
Archive   : PICALC.ZIP
Filename : PI386.LST

  1. Very nice! Thank you for this wonderful archive. I wonder why I found it only now. Long live the BBS file archives!

  2. This is so awesome! 😀 I’d be cool if you could download an entire archive of this at once, though.

  3. But one thing that puzzles me is the “mtswslnkmcjklsdlsbdmMICROSOFT” string. There is an article about it here. It is definitely worth a read: http://www.os2museum.com/wp/mtswslnk/