Dec 052017

RSX DPMI-extender for EMX and DJGPP apps rel 5. | |||
---|---|---|---|

File Name | File Size | Zip Size | Zip Type |

RSX | 0 | 0 | stored |

BIN | 0 | 0 | stored |

BINDDJ.EXE | 10370 | 7340 | deflated |

DEB.EXE | 69636 | 31995 | deflated |

DPMIFUN.EXE | 53252 | 20768 | deflated |

DPMIINFO.EXE | 28676 | 11222 | deflated |

EMXLDPMI.EXE | 1324 | 571 | deflated |

PAKRSX.BAT | 69 | 60 | deflated |

RSX.EXE | 99724 | 48927 | deflated |

RSXOPT.EXE | 11402 | 7922 | deflated |

STUBDJ.EXE | 6204 | 4740 | deflated |

DOC | 0 | 0 | stored |

COPYING | 18321 | 6637 | deflated |

COPYING.RSX | 1505 | 685 | deflated |

DPMILB11.TXT | 15576 | 4637 | deflated |

FAQ.TXT | 7625 | 3089 | deflated |

HISTORY.TXT | 4694 | 2094 | deflated |

INSTALL.TXT | 16154 | 5982 | deflated |

KRNLDEB.TXT | 1830 | 973 | deflated |

README.TXT | 2238 | 1054 | deflated |

RSX4EMX.TXT | 4426 | 1634 | deflated |

SOURCE | 0 | 0 | stored |

ADOSX32.H | 499 | 199 | deflated |

ASM16 | 0 | 0 | stored |

ADOSX32.ASM | 8486 | 2368 | deflated |

COPY32.ASM | 6412 | 1136 | deflated |

DPMI10.ASM | 8019 | 1549 | deflated |

DPMI16.ASM | 23238 | 4346 | deflated |

EXCEP32.ASM | 8811 | 2514 | deflated |

FPU.ASM | 2051 | 796 | deflated |

REGS386.INC | 954 | 227 | deflated |

SWITCH.ASM | 1915 | 696 | deflated |

TRANS.INC | 245 | 98 | deflated |

ASM32 | 0 | 0 | stored |

ADOSX32.S | 6880 | 1552 | deflated |

COPY32.S | 3172 | 569 | deflated |

CRT0.S | 850 | 350 | deflated |

DPMI.C | 14920 | 2354 | deflated |

DPMI10.S | 6894 | 1479 | deflated |

EXCEP32.S | 7165 | 2099 | deflated |

FPU.C | 1595 | 614 | deflated |

REGS386.INC | 1591 | 398 | deflated |

BUILD | 0 | 0 | stored |

BUILD | 50 | 48 | deflated |

CDOSX32.C | 19329 | 4950 | deflated |

CDOSX32.H | 435 | 243 | deflated |

COPY32.H | 746 | 289 | deflated |

DEB | 0 | 0 | stored |

ANSI.H | 229 | 123 | deflated |

BREAKP.C | 2234 | 648 | deflated |

BREAKP.H | 392 | 193 | deflated |

COFF.H | 9555 | 3194 | deflated |

DEB.C | 14078 | 4107 | deflated |

DPMI.C | 7454 | 2129 | deflated |

DPMI.H | 74 | 65 | deflated |

INPUT.C | 5104 | 1476 | deflated |

MAKEFILE | 391 | 236 | deflated |

STAB.H | 1151 | 495 | deflated |

SYMS.C | 19085 | 5114 | deflated |

SYMS.H | 609 | 269 | deflated |

UNASSMBL.C | 27512 | 7889 | deflated |

UNASSMBL.H | 208 | 113 | deflated |

DJIO.C | 14134 | 4085 | deflated |

DJIO.H | 317 | 146 | deflated |

DJLIBRSX | 0 | 0 | stored |

FORK.S | 137 | 107 | deflated |

GETPID.S | 107 | 78 | deflated |

GETPPID.S | 118 | 80 | deflated |

KILL.S | 176 | 127 | deflated |

LIBRSX.A | 5354 | 1606 | deflated |

MAKEFILE | 359 | 177 | deflated |

PTRACE.S | 338 | 182 | deflated |

RAISE.S | 167 | 123 | deflated |

README | 297 | 165 | deflated |

SIGNAL.S | 221 | 144 | deflated |

SPAWNVE.C | 4533 | 1108 | deflated |

WAIT.S | 175 | 126 | deflated |

_PROCESS.H | 1118 | 294 | deflated |

_PTRACE.H | 533 | 206 | deflated |

_REG.H | 467 | 174 | deflated |

_SIGNAL.H | 657 | 267 | deflated |

_USER.H | 789 | 263 | deflated |

DOSERRNO.C | 7829 | 2091 | deflated |

DOSERRNO.H | 831 | 303 | deflated |

DPMI | 0 | 0 | stored |

DPMI.H | 8174 | 2558 | deflated |

DPMI10.H | 2516 | 780 | deflated |

DPMITYPE.H | 147 | 70 | deflated |

DPMIUTIL.C | 3520 | 1104 | deflated |

DPMIFUN.C | 8690 | 2790 | deflated |

DPMIFUN.H | 7503 | 1323 | deflated |

DPMIFUN2.S | 399 | 202 | deflated |

DPMIINFO.C | 4226 | 1506 | deflated |

INPUT.C | 5104 | 1476 | deflated |

MAKEFILE | 968 | 329 | deflated |

EXCEP32.H | 1466 | 324 | deflated |

EXTERNA.H | 5269 | 1010 | deflated |

FPU-EMU | 0 | 0 | stored |

BUILD | 0 | 0 | stored |

CONTROL_.H | 1815 | 602 | deflated |

CRT0FPU.S | 3932 | 1359 | deflated |

CRT0FPUW.S | 1980 | 722 | deflated |

DIV_SMAL.S | 1492 | 440 | deflated |

ERRORS.C | 16569 | 4309 | deflated |

EXCEPTIO.H | 1798 | 659 | deflated |

FPU-EMU.RSP | 565 | 160 | deflated |

FPU_ARIT.C | 3698 | 569 | deflated |

FPU_ASM.H | 934 | 314 | deflated |

FPU_AUX.C | 3862 | 1233 | deflated |

FPU_EMU.H | 5824 | 1985 | deflated |

FPU_ENTR.C | 19417 | 5375 | deflated |

FPU_ETC.C | 2998 | 865 | deflated |

FPU_PROT.H | 3973 | 844 | deflated |

FPU_SYST.H | 2659 | 1026 | deflated |

FPU_TRIG.C | 40832 | 7657 | deflated |

GET_ADDR.C | 8015 | 1952 | deflated |

INCLUDE | 0 | 0 | stored |

ASM | 0 | 0 | stored |

SEGMENT.H | 4775 | 1146 | deflated |

LINUX | 0 | 0 | stored |

CONFIG.H | 806 | 241 | deflated |

KERNEL.H | 338 | 208 | deflated |

LINKAGE.H | 140 | 84 | deflated |

MATH_EMU.H | 748 | 339 | deflated |

SCHED.H | 636 | 273 | deflated |

SEGMENT.H | 150 | 82 | deflated |

SIGNAL.H | 1384 | 588 | deflated |

STDDEF.H | 243 | 149 | deflated |

LOAD_STO.C | 7284 | 1771 | deflated |

MAKEFILE | 1475 | 478 | deflated |

POLYNOMI.S | 3736 | 1034 | deflated |

POLY_2XM.C | 2811 | 1058 | deflated |

POLY_ATA.C | 6012 | 1950 | deflated |

POLY_DIV.S | 2304 | 487 | deflated |

POLY_L2.C | 7961 | 2429 | deflated |

POLY_MUL.S | 1739 | 484 | deflated |

POLY_SIN.C | 4580 | 1480 | deflated |

POLY_TAN.C | 5023 | 1370 | deflated |

PRINTK.C | 6890 | 2095 | deflated |

PRINTK.H | 4218 | 539 | deflated |

README | 16031 | 6103 | deflated |

REG_ADD_.C | 7711 | 1447 | deflated |

REG_COMP.C | 8040 | 1603 | deflated |

REG_CONS.C | 3397 | 913 | deflated |

REG_CONS.H | 1167 | 285 | deflated |

REG_DIV.S | 5464 | 1539 | deflated |

REG_LD_S.C | 37024 | 7255 | deflated |

REG_MUL.C | 3266 | 989 | deflated |

REG_NORM.S | 3232 | 801 | deflated |

REG_ROUN.S | 17675 | 4273 | deflated |

REG_U_AD.S | 4180 | 1412 | deflated |

REG_U_DI.S | 12190 | 2973 | deflated |

REG_U_MU.S | 3764 | 1146 | deflated |

REG_U_SU.S | 6387 | 1888 | deflated |

STATUS_W.H | 2483 | 840 | deflated |

VERIFY.C | 71 | 66 | deflated |

VERSION.H | 844 | 208 | deflated |

WM_SHRX.S | 6156 | 1241 | deflated |

WM_SQRT.S | 10954 | 2923 | deflated |

FPU.H | 371 | 202 | deflated |

FS.C | 13439 | 3126 | deflated |

FS.H | 2610 | 863 | deflated |

GNUAOUT.H | 2359 | 836 | deflated |

INDENTC.BAT | 40 | 40 | stored |

KDEB.C | 9728 | 2490 | deflated |

KDEB.H | 318 | 179 | deflated |

LIBC.C | 10763 | 2478 | deflated |

LOADER | 0 | 0 | stored |

CRT1.ASM | 1135 | 486 | deflated |

LOAD2.ASM | 6822 | 1900 | deflated |

LOADER.C | 11767 | 3611 | deflated |

LOADER.EXE | 4488 | 2489 | deflated |

LOADER.H | 3939 | 1345 | deflated |

LOADER.MAP | 538 | 236 | deflated |

MAKEFILE | 522 | 264 | deflated |

SLIBCE.LIB | 2575 | 1159 | deflated |

LOADPRG.C | 17104 | 4752 | deflated |

LOADPRG.H | 342 | 198 | deflated |

MAKEFILE | 2333 | 826 | deflated |

MAKEFILE.MSC | 1852 | 673 | deflated |

MAKEFILE.WAT | 1895 | 684 | deflated |

OFILES.RSP | 394 | 139 | deflated |

PRINTF.C | 7801 | 2295 | deflated |

PRINTF.H | 162 | 112 | deflated |

PROCESS.C | 22635 | 6249 | deflated |

PROCESS.H | 6867 | 2042 | deflated |

PROCESS.S | 25127 | 4873 | deflated |

PTRACE.C | 4270 | 1239 | deflated |

PTRACE.H | 455 | 191 | deflated |

RMLIB.C | 11171 | 2256 | deflated |

RMLIB.H | 3214 | 970 | deflated |

RSX.C | 10370 | 3356 | deflated |

RSX.H | 792 | 322 | deflated |

RSXDEB.BAT | 309 | 150 | deflated |

SAMPLE | 0 | 0 | stored |

ALLOC.C | 7354 | 2072 | deflated |

EXEC.C | 2363 | 954 | deflated |

FORK.C | 2840 | 1077 | deflated |

FPE.C | 466 | 188 | deflated |

PIPE.C | 1707 | 663 | deflated |

SIGNALS.C | 16771 | 4602 | deflated |

SIGNALS.H | 1170 | 512 | deflated |

SIGNALS3.C | 16888 | 4705 | deflated |

START32.C | 12856 | 3375 | deflated |

START32.H | 893 | 374 | deflated |

STATEMX.C | 3776 | 1144 | deflated |

STATEMX.H | 836 | 351 | deflated |

STUB | 0 | 0 | stored |

BINDDJ.C | 2457 | 830 | deflated |

EMXLDPMI.ASM | 10045 | 3727 | deflated |

EMXLDPMI.EXE | 1324 | 571 | deflated |

EMXLDPMI.MAK | 182 | 93 | deflated |

RSXOPT.C | 5586 | 1678 | deflated |

STUBDJ.C | 1685 | 762 | deflated |

SYSDEP.C | 5080 | 1842 | deflated |

SYSDEP.H | 293 | 167 | deflated |

SYSDEP2.C | 1376 | 552 | deflated |

SYSDJ.C | 14885 | 4367 | deflated |

SYSEMX.C | 26297 | 7144 | deflated |

TERMIO.C | 12799 | 3601 | deflated |

TERMIO.H | 3660 | 1011 | deflated |

TIMEDOS.C | 3799 | 1095 | deflated |

TIMEDOS.H | 851 | 322 | deflated |

VERSION.H | 284 | 165 | deflated |

TPCREAD.ME | 199 | 165 | deflated |

# Download File DPMIGCC5.ZIP Here

## Contents of the README file

The Library LIBRSX.A contains signal,spawnve and ptrace stuff for DJGPP.

Copy the files

LIBRSX.A -> \djgpp\lib

_PROCESS.H -> \djgpp\include\sys

_USER.H -> \djgpp\include\sys

_PTRACE.H -> \djgpp\include\sys

_REG.H -> \djgpp\include\sys

_SIGNAL.H -> \djgpp\include\sys

+---------------------------------------------------------------------------+

| wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. |

| |

| Copyright (C) 1992,1993,1994 |

| W. Metzenthen, 22 Parker St, Ormond, Vic 3163, |

| Australia. E-mail [email protected] |

| |

| This program is free software; you can redistribute it and/or modify |

| it under the terms of the GNU General Public License version 2 as |

| published by the Free Software Foundation. |

| |

| This program is distributed in the hope that it will be useful, |

| but WITHOUT ANY WARRANTY; without even the implied warranty of |

| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |

| GNU General Public License for more details. |

| |

| You should have received a copy of the GNU General Public License |

| along with this program; if not, write to the Free Software |

| Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |

| |

+---------------------------------------------------------------------------+

wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387

which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was

in turn based upon emu387 which was written by DJ Delorie for djgpp.

The interface to the Linux kernel is based upon the original Linux

math emulator by Linus Torvalds.

My target FPU for wm-FPU-emu is that described in the Intel486

Programmer's Reference Manual (1992 edition). Unfortunately, numerous

facets of the functioning of the FPU are not well covered in the

Reference Manual. The information in the manual has been supplemented

with measurements on real 80486's. Unfortunately, it is simply not

possible to be sure that all of the peculiarities of the 80486 have

been discovered, so there is always likely to be obscure differences

in the detailed behaviour of the emulator and a real 80486.

wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.

See "Limitations" later in this file for a list of some differences.

Please report bugs, etc to me at:

[email protected]

or at:

[email protected]

--Bill Metzenthen

March 1994

----------------------- Internals of wm-FPU-emu -----------------------

Numeric algorithms:

(1) Add, subtract, and multiply. Nothing remarkable in these.

(2) Divide has been tuned to get reasonable performance. The algorithm

is not the obvious one which most people seem to use, but is designed

to take advantage of the characteristics of the 80386. I expect that

it has been invented many times before I discovered it, but I have not

seen it. It is based upon one of those ideas which one carries around

for years without ever bothering to check it out.

(3) The sqrt function has been tuned to get good performance. It is based

upon Newton's classic method. Performance was improved by capitalizing

upon the properties of Newton's method, and the code is once again

structured taking account of the 80386 characteristics.

(4) The trig, log, and exp functions are based in each case upon quasi-

"optimal" polynomial approximations. My definition of "optimal" was

based upon getting good accuracy with reasonable speed.

(5) The argument reducing code for the trig function effectively uses

a value of pi which is accurate to more than 128 bits. As a consequence,

the reduced argument is accurate to more than 64 bits for arguments up

to a few pi, and accurate to more than 64 bits for most arguments,

even for arguments approaching 2^63. This is far superior to an

80486, which uses a value of pi which is accurate to 66 bits.

The code of the emulator is complicated slightly by the need to

account for a limited form of re-entrancy. Normally, the emulator will

emulate each FPU instruction to completion without interruption.

However, it may happen that when the emulator is accessing the user

memory space, swapping may be needed. In this case the emulator may be

temporarily suspended while disk i/o takes place. During this time

another process may use the emulator, thereby changing some static

variables (eg FPU_st0_ptr, etc). The code which accesses user memory

is confined to five files:

fpu_entry.c

reg_ld_str.c

load_store.c

get_address.c

errors.c

----------------------- Limitations of wm-FPU-emu -----------------------

There are a number of differences between the current wm-FPU-emu

(version beta 1.11) and the 80486 FPU (apart from bugs). Some of the

more important differences are listed below:

The Roundup flag does not have much meaning for the transcendental

functions and its 80486 value with these functions is likely to differ

from its emulator value.

In a few rare cases the Underflow flag obtained with the emulator will

be different from that obtained with an 80486. This occurs when the

following conditions apply simultaneously:

(a) the operands have a higher precision than the current setting of the

precision control (PC) flags.

(b) the underflow exception is masked.

(c) the magnitude of the exact result (before rounding) is less than 2^-16382.

(d) the magnitude of the final result (after rounding) is exactly 2^-16382.

(e) the magnitude of the exact result would be exactly 2^-16382 if the

operands were rounded to the current precision before the arithmetic

operation was performed.

If all of these apply, the emulator will set the Underflow flag but a real

80486 will not.

NOTE: Certain formats of Extended Real are UNSUPPORTED. They are

unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,

and Unnormals. None of these will be generated by an 80486 or by the

emulator. Do not use them. The emulator treats them differently in

detail from the way an 80486 does.

The emulator treats PseudoDenormals differently from an 80486. These

numbers are in fact properly normalised numbers with the exponent

offset by 1, and the emulator treats them as such. Unlike the 80486,

the emulator does not generate a Denormal Operand exception for these

numbers. The arithmetical results produced when using such a number as

an operand are the same for the emulator and a real 80486 (apart from

any slight precision difference for the transcendental functions).

Neither the emulator nor an 80486 produces one of these numbers as the

result of any arithmetic operation. An 80486 can keep one of these

numbers in an FPU register with its identity as a PseudoDenormal, but

the emulator will not; they are always converted to a valid number.

Self modifying code can cause the emulator to fail. An example of such

code is:

movl %esp,[%ebx]

fld1

The FPU instruction may be (usually will be) loaded into the pre-fetch

queue of the cpu before the mov instruction is executed. If the

destination of the 'movl' overlaps the FPU instruction then the bytes

in the prefetch queue and memory will be inconsistent when the FPU

instruction is executed. The emulator will be invoked but will not be

able to find the instruction which caused the device-not-present

exception. For this case, the emulator cannot emulate the behaviour of

an 80486DX.

Handling of the address size override prefix byte (0x67) has not been

extensively tested yet. A major problem exists because using it in

vm86 mode can cause a general protection fault. Address offsets

greater than 0xffff appear to be illegal in vm86 mode but are quite

acceptable (and work) in real mode. A small test program developed to

check the addressing, and which runs successfully in real mode,

crashes dosemu under Linux and also brings Windows down with a general

protection fault message when run under the MS-DOS prompt of Windows

3.1. (The program simply reads data from a valid address).

----------------------- Performance of wm-FPU-emu -----------------------

Speed.

-----

The speed of floating point computation with the emulator will depend

upon instruction mix. Relative performance is best for the instructions

which require most computation. The simple instructions are adversely

affected by the fpu instruction trap overhead.

Timing: Some simple timing tests have been made on the emulator functions.

The times include load/store instructions. All times are in microseconds

measured on a 33MHz 386 with 64k cache. The Turbo C tests were under

ms-dos, the next two columns are for emulators running with the djgpp

ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,

using libm4.0 (hard).

function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu

+ 60.5 154.8 76.5 139.4

- 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7

* 71.0 190.8 79.6 146.6

/ 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1

sin() 310.8 4692.0 319.0 398.5

cos() 284.4 4855.2 308.0 388.7

tan() 495.0 8807.1 394.9 504.7

atan() 328.9 4866.4 601.1 419.5-491.9

sqrt() 128.7 crashed 145.2 227.0

log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1

exp() 479.1 6619.2 469.1 850.8

The performance under Linux is improved by the use of look-ahead code.

The following results show the improvement which is obtained under

Linux due to the look-ahead code. Also given are the times for the

original Linux emulator with the 4.1 'soft' lib.

[ Linus' note: I changed look-ahead to be the default under linux, as

there was no reason not to use it after I had edited it to be

disabled during tracing ]

wm-FPU-emu w original w

look-ahead 'soft' lib

+ 106.4 190.2

- 108.6-111.6 192.4-216.2

* 113.4 193.1

/ 108.8-124.4 700.1-706.2

sin() 390.5 2642.0

cos() 381.5 2767.4

tan() 496.5 3153.3

atan() 367.2-435.5 2439.4-3396.8

sqrt() 195.1 4732.5

log() 358.0-387.5 3359.2-3390.3

exp() 619.3 4046.4

These figures are now somewhat out-of-date. The emulator has become

progressively slower for most functions as more of the 80486 features

have been implemented.

----------------------- Accuracy of wm-FPU-emu -----------------------

Accuracy: The following table gives the accuracy of the sqrt(), trig

and log functions. Each function was tested at about 400 points. Ideal

results would be 64 bits. The reduced accuracy of cos() and tan() for

arguments greater than pi/4 can be thought of as being due to the

precision of the argument x; e.g. an argument of pi/2-(1e-10) which is

accurate to 64 bits can result in a relative accuracy in cos() of about

64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given

in the last column.

Function Tested x range Worst result Turbo C

(relative bits)

sqrt(x) 1 .. 2 64.1 63.2

atan(x) 1e-10 .. 200 62.6 62.8

cos(x) 0 .. pi/2-(1e-10) 63.2 (x <= pi/4) 62.4

35.2 (x = pi/2-(1e-10)) 31.9

sin(x) 1e-10 .. pi/2 63.0 62.8

tan(x) 1e-10 .. pi/2-(1e-10) 62.4 (x <= pi/4) 62.1

35.2 (x = pi/2-(1e-10)) 31.9

exp(x) 0 .. 1 63.1 62.9

log(x) 1+1e-6 .. 2 62.4 62.1

As of version 1.3 of the emulator, the accuracy of the basic

arithmetic has been improved (by a small fraction of a bit). Care has

been taken to ensure full accuracy of the rounding of the basic

arithmetic functions (+,-,*,/,and fsqrt), and they all now produce

results which are exact to the 64th bit (unless there are any bugs

left). To ensure this, it was necessary to effectively get information

of up to about 128 bits precision. The emulator now passes the

"paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24

bit precision numbers) when precision control is set to 24, 53 or 64

bits, and for 'double' variables (53 bit precision numbers) when

precision control is set to 53 bits (a properly performing FPU cannot

pass the 'paranoia' tests for 'double' variables when precision

control is set to 64 bits).

For version 1.5, the accuracy of fprem and fprem1 has been improved.

These functions now produce exact results. The code for reducing the

argument for the trig functions (fsin, fcos, fptan and fsincos) has

been improved and now effectively uses a value for pi which is

accurate to more than 128 bits precision. As a consquence, the

accuracy of these functions for large arguments has been dramatically

improved (and is now very much better than an 80486 FPU). There is

also now no degradation of accuracy for fcos and ftan for operands

close to pi/2. Measured results are (note that the definition of

accuracy has changed slightly from that used for the above table):

Function Tested x range Worst result

(absolute bits)

cos(x) 0 .. 9.22e+18 62.0

sin(x) 1e-16 .. 9.22e+18 62.1

tan(x) 1e-16 .. 9.22e+18 61.8

It is possible with some effort to find very large arguments which

give much degraded precision. For example, the integer number

8227740058411162616.0

is within about 10e-7 of a multiple of pi. To find the tan (for

example) of this number to 64 bits precision it would be necessary to

have a value of pi which had about 150 bits precision. The FPU

emulator computes the result to about 42.6 bits precision (the correct

result is about -9.739715e-8). On the other hand, an 80486 FPU returns

0.01059, which in relative terms is hopelessly inaccurate.

For arguments close to critical angles (which occur at multiples of

pi/2) the emulator is more accurate than an 80486 FPU. For very large

arguments, the emulator is far more accurate.

------------------------- Contributors -------------------------------

A number of people have contributed to the development of the

emulator, often by just reporting bugs, sometimes with suggested

fixes, and a few kind people have provided me with access in one way

or another to an 80486 machine. Contributors include (to those people

who I may have forgotten, please forgive me):

Linus Torvalds

[email protected]

[email protected]

Nick Holloway, [email protected]

Hermano Moura, [email protected]

Jon Jagger, [email protected]

Lennart Benschop

Brian Gallew, [email protected]

Thomas Staniszewski, [email protected]

Martin Howell, [email protected]

M Saggaf, [email protected]

Peter Barker, [email protected]

[email protected]

Dan Russel, [email protected]

Daniel Carosone, [email protected]

[email protected]

Hamish Coleman, [email protected]

Bruce Evans, [email protected]

Timo Korvola, [email protected]

Rick Lyons, [email protected]

...and numerous others who responded to my request for help with

a real 80486.

Copy the files

LIBRSX.A -> \djgpp\lib

_PROCESS.H -> \djgpp\include\sys

_USER.H -> \djgpp\include\sys

_PTRACE.H -> \djgpp\include\sys

_REG.H -> \djgpp\include\sys

_SIGNAL.H -> \djgpp\include\sys

+---------------------------------------------------------------------------+

| wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. |

| |

| Copyright (C) 1992,1993,1994 |

| W. Metzenthen, 22 Parker St, Ormond, Vic 3163, |

| Australia. E-mail [email protected] |

| |

| This program is free software; you can redistribute it and/or modify |

| it under the terms of the GNU General Public License version 2 as |

| published by the Free Software Foundation. |

| |

| This program is distributed in the hope that it will be useful, |

| but WITHOUT ANY WARRANTY; without even the implied warranty of |

| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |

| GNU General Public License for more details. |

| |

| You should have received a copy of the GNU General Public License |

| along with this program; if not, write to the Free Software |

| Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |

| |

+---------------------------------------------------------------------------+

wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387

which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was

in turn based upon emu387 which was written by DJ Delorie for djgpp.

The interface to the Linux kernel is based upon the original Linux

math emulator by Linus Torvalds.

My target FPU for wm-FPU-emu is that described in the Intel486

Programmer's Reference Manual (1992 edition). Unfortunately, numerous

facets of the functioning of the FPU are not well covered in the

Reference Manual. The information in the manual has been supplemented

with measurements on real 80486's. Unfortunately, it is simply not

possible to be sure that all of the peculiarities of the 80486 have

been discovered, so there is always likely to be obscure differences

in the detailed behaviour of the emulator and a real 80486.

wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.

See "Limitations" later in this file for a list of some differences.

Please report bugs, etc to me at:

[email protected]

or at:

[email protected]

--Bill Metzenthen

March 1994

----------------------- Internals of wm-FPU-emu -----------------------

Numeric algorithms:

(1) Add, subtract, and multiply. Nothing remarkable in these.

(2) Divide has been tuned to get reasonable performance. The algorithm

is not the obvious one which most people seem to use, but is designed

to take advantage of the characteristics of the 80386. I expect that

it has been invented many times before I discovered it, but I have not

seen it. It is based upon one of those ideas which one carries around

for years without ever bothering to check it out.

(3) The sqrt function has been tuned to get good performance. It is based

upon Newton's classic method. Performance was improved by capitalizing

upon the properties of Newton's method, and the code is once again

structured taking account of the 80386 characteristics.

(4) The trig, log, and exp functions are based in each case upon quasi-

"optimal" polynomial approximations. My definition of "optimal" was

based upon getting good accuracy with reasonable speed.

(5) The argument reducing code for the trig function effectively uses

a value of pi which is accurate to more than 128 bits. As a consequence,

the reduced argument is accurate to more than 64 bits for arguments up

to a few pi, and accurate to more than 64 bits for most arguments,

even for arguments approaching 2^63. This is far superior to an

80486, which uses a value of pi which is accurate to 66 bits.

The code of the emulator is complicated slightly by the need to

account for a limited form of re-entrancy. Normally, the emulator will

emulate each FPU instruction to completion without interruption.

However, it may happen that when the emulator is accessing the user

memory space, swapping may be needed. In this case the emulator may be

temporarily suspended while disk i/o takes place. During this time

another process may use the emulator, thereby changing some static

variables (eg FPU_st0_ptr, etc). The code which accesses user memory

is confined to five files:

fpu_entry.c

reg_ld_str.c

load_store.c

get_address.c

errors.c

----------------------- Limitations of wm-FPU-emu -----------------------

There are a number of differences between the current wm-FPU-emu

(version beta 1.11) and the 80486 FPU (apart from bugs). Some of the

more important differences are listed below:

The Roundup flag does not have much meaning for the transcendental

functions and its 80486 value with these functions is likely to differ

from its emulator value.

In a few rare cases the Underflow flag obtained with the emulator will

be different from that obtained with an 80486. This occurs when the

following conditions apply simultaneously:

(a) the operands have a higher precision than the current setting of the

precision control (PC) flags.

(b) the underflow exception is masked.

(c) the magnitude of the exact result (before rounding) is less than 2^-16382.

(d) the magnitude of the final result (after rounding) is exactly 2^-16382.

(e) the magnitude of the exact result would be exactly 2^-16382 if the

operands were rounded to the current precision before the arithmetic

operation was performed.

If all of these apply, the emulator will set the Underflow flag but a real

80486 will not.

NOTE: Certain formats of Extended Real are UNSUPPORTED. They are

unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,

and Unnormals. None of these will be generated by an 80486 or by the

emulator. Do not use them. The emulator treats them differently in

detail from the way an 80486 does.

The emulator treats PseudoDenormals differently from an 80486. These

numbers are in fact properly normalised numbers with the exponent

offset by 1, and the emulator treats them as such. Unlike the 80486,

the emulator does not generate a Denormal Operand exception for these

numbers. The arithmetical results produced when using such a number as

an operand are the same for the emulator and a real 80486 (apart from

any slight precision difference for the transcendental functions).

Neither the emulator nor an 80486 produces one of these numbers as the

result of any arithmetic operation. An 80486 can keep one of these

numbers in an FPU register with its identity as a PseudoDenormal, but

the emulator will not; they are always converted to a valid number.

Self modifying code can cause the emulator to fail. An example of such

code is:

movl %esp,[%ebx]

fld1

The FPU instruction may be (usually will be) loaded into the pre-fetch

queue of the cpu before the mov instruction is executed. If the

destination of the 'movl' overlaps the FPU instruction then the bytes

in the prefetch queue and memory will be inconsistent when the FPU

instruction is executed. The emulator will be invoked but will not be

able to find the instruction which caused the device-not-present

exception. For this case, the emulator cannot emulate the behaviour of

an 80486DX.

Handling of the address size override prefix byte (0x67) has not been

extensively tested yet. A major problem exists because using it in

vm86 mode can cause a general protection fault. Address offsets

greater than 0xffff appear to be illegal in vm86 mode but are quite

acceptable (and work) in real mode. A small test program developed to

check the addressing, and which runs successfully in real mode,

crashes dosemu under Linux and also brings Windows down with a general

protection fault message when run under the MS-DOS prompt of Windows

3.1. (The program simply reads data from a valid address).

----------------------- Performance of wm-FPU-emu -----------------------

Speed.

-----

The speed of floating point computation with the emulator will depend

upon instruction mix. Relative performance is best for the instructions

which require most computation. The simple instructions are adversely

affected by the fpu instruction trap overhead.

Timing: Some simple timing tests have been made on the emulator functions.

The times include load/store instructions. All times are in microseconds

measured on a 33MHz 386 with 64k cache. The Turbo C tests were under

ms-dos, the next two columns are for emulators running with the djgpp

ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,

using libm4.0 (hard).

function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu

+ 60.5 154.8 76.5 139.4

- 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7

* 71.0 190.8 79.6 146.6

/ 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1

sin() 310.8 4692.0 319.0 398.5

cos() 284.4 4855.2 308.0 388.7

tan() 495.0 8807.1 394.9 504.7

atan() 328.9 4866.4 601.1 419.5-491.9

sqrt() 128.7 crashed 145.2 227.0

log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1

exp() 479.1 6619.2 469.1 850.8

The performance under Linux is improved by the use of look-ahead code.

The following results show the improvement which is obtained under

Linux due to the look-ahead code. Also given are the times for the

original Linux emulator with the 4.1 'soft' lib.

[ Linus' note: I changed look-ahead to be the default under linux, as

there was no reason not to use it after I had edited it to be

disabled during tracing ]

wm-FPU-emu w original w

look-ahead 'soft' lib

+ 106.4 190.2

- 108.6-111.6 192.4-216.2

* 113.4 193.1

/ 108.8-124.4 700.1-706.2

sin() 390.5 2642.0

cos() 381.5 2767.4

tan() 496.5 3153.3

atan() 367.2-435.5 2439.4-3396.8

sqrt() 195.1 4732.5

log() 358.0-387.5 3359.2-3390.3

exp() 619.3 4046.4

These figures are now somewhat out-of-date. The emulator has become

progressively slower for most functions as more of the 80486 features

have been implemented.

----------------------- Accuracy of wm-FPU-emu -----------------------

Accuracy: The following table gives the accuracy of the sqrt(), trig

and log functions. Each function was tested at about 400 points. Ideal

results would be 64 bits. The reduced accuracy of cos() and tan() for

arguments greater than pi/4 can be thought of as being due to the

precision of the argument x; e.g. an argument of pi/2-(1e-10) which is

accurate to 64 bits can result in a relative accuracy in cos() of about

64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given

in the last column.

Function Tested x range Worst result Turbo C

(relative bits)

sqrt(x) 1 .. 2 64.1 63.2

atan(x) 1e-10 .. 200 62.6 62.8

cos(x) 0 .. pi/2-(1e-10) 63.2 (x <= pi/4) 62.4

35.2 (x = pi/2-(1e-10)) 31.9

sin(x) 1e-10 .. pi/2 63.0 62.8

tan(x) 1e-10 .. pi/2-(1e-10) 62.4 (x <= pi/4) 62.1

35.2 (x = pi/2-(1e-10)) 31.9

exp(x) 0 .. 1 63.1 62.9

log(x) 1+1e-6 .. 2 62.4 62.1

As of version 1.3 of the emulator, the accuracy of the basic

arithmetic has been improved (by a small fraction of a bit). Care has

been taken to ensure full accuracy of the rounding of the basic

arithmetic functions (+,-,*,/,and fsqrt), and they all now produce

results which are exact to the 64th bit (unless there are any bugs

left). To ensure this, it was necessary to effectively get information

of up to about 128 bits precision. The emulator now passes the

"paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24

bit precision numbers) when precision control is set to 24, 53 or 64

bits, and for 'double' variables (53 bit precision numbers) when

precision control is set to 53 bits (a properly performing FPU cannot

pass the 'paranoia' tests for 'double' variables when precision

control is set to 64 bits).

For version 1.5, the accuracy of fprem and fprem1 has been improved.

These functions now produce exact results. The code for reducing the

argument for the trig functions (fsin, fcos, fptan and fsincos) has

been improved and now effectively uses a value for pi which is

accurate to more than 128 bits precision. As a consquence, the

accuracy of these functions for large arguments has been dramatically

improved (and is now very much better than an 80486 FPU). There is

also now no degradation of accuracy for fcos and ftan for operands

close to pi/2. Measured results are (note that the definition of

accuracy has changed slightly from that used for the above table):

Function Tested x range Worst result

(absolute bits)

cos(x) 0 .. 9.22e+18 62.0

sin(x) 1e-16 .. 9.22e+18 62.1

tan(x) 1e-16 .. 9.22e+18 61.8

It is possible with some effort to find very large arguments which

give much degraded precision. For example, the integer number

8227740058411162616.0

is within about 10e-7 of a multiple of pi. To find the tan (for

example) of this number to 64 bits precision it would be necessary to

have a value of pi which had about 150 bits precision. The FPU

emulator computes the result to about 42.6 bits precision (the correct

result is about -9.739715e-8). On the other hand, an 80486 FPU returns

0.01059, which in relative terms is hopelessly inaccurate.

For arguments close to critical angles (which occur at multiples of

pi/2) the emulator is more accurate than an 80486 FPU. For very large

arguments, the emulator is far more accurate.

------------------------- Contributors -------------------------------

A number of people have contributed to the development of the

emulator, often by just reporting bugs, sometimes with suggested

fixes, and a few kind people have provided me with access in one way

or another to an 80486 machine. Contributors include (to those people

who I may have forgotten, please forgive me):

Linus Torvalds

[email protected]

[email protected]

Nick Holloway, [email protected]

Hermano Moura, [email protected]

Jon Jagger, [email protected]

Lennart Benschop

Brian Gallew, [email protected]

Thomas Staniszewski, [email protected]

Martin Howell, [email protected]

M Saggaf, [email protected]

Peter Barker, [email protected]

[email protected]

Dan Russel, [email protected]

Daniel Carosone, [email protected]

[email protected]

Hamish Coleman, [email protected]

Bruce Evans, [email protected]

Timo Korvola, [email protected]

Rick Lyons, [email protected]

...and numerous others who responded to my request for help with

a real 80486.

## Contents of the README.TXT file

The Library LIBRSX.A contains signal,spawnve and ptrace stuff for DJGPP.

Copy the files

LIBRSX.A -> \djgpp\lib

_PROCESS.H -> \djgpp\include\sys

_USER.H -> \djgpp\include\sys

_PTRACE.H -> \djgpp\include\sys

_REG.H -> \djgpp\include\sys

_SIGNAL.H -> \djgpp\include\sys

+---------------------------------------------------------------------------+

| wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. |

| |

| Copyright (C) 1992,1993,1994 |

| W. Metzenthen, 22 Parker St, Ormond, Vic 3163, |

| Australia. E-mail [email protected] |

| |

| This program is free software; you can redistribute it and/or modify |

| it under the terms of the GNU General Public License version 2 as |

| published by the Free Software Foundation. |

| |

| This program is distributed in the hope that it will be useful, |

| but WITHOUT ANY WARRANTY; without even the implied warranty of |

| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |

| GNU General Public License for more details. |

| |

| You should have received a copy of the GNU General Public License |

| along with this program; if not, write to the Free Software |

| Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |

| |

+---------------------------------------------------------------------------+

wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387

which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was

in turn based upon emu387 which was written by DJ Delorie for djgpp.

The interface to the Linux kernel is based upon the original Linux

math emulator by Linus Torvalds.

My target FPU for wm-FPU-emu is that described in the Intel486

Programmer's Reference Manual (1992 edition). Unfortunately, numerous

facets of the functioning of the FPU are not well covered in the

Reference Manual. The information in the manual has been supplemented

with measurements on real 80486's. Unfortunately, it is simply not

possible to be sure that all of the peculiarities of the 80486 have

been discovered, so there is always likely to be obscure differences

in the detailed behaviour of the emulator and a real 80486.

wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.

See "Limitations" later in this file for a list of some differences.

Please report bugs, etc to me at:

[email protected]

or at:

[email protected]

--Bill Metzenthen

March 1994

----------------------- Internals of wm-FPU-emu -----------------------

Numeric algorithms:

(1) Add, subtract, and multiply. Nothing remarkable in these.

(2) Divide has been tuned to get reasonable performance. The algorithm

is not the obvious one which most people seem to use, but is designed

to take advantage of the characteristics of the 80386. I expect that

it has been invented many times before I discovered it, but I have not

seen it. It is based upon one of those ideas which one carries around

for years without ever bothering to check it out.

(3) The sqrt function has been tuned to get good performance. It is based

upon Newton's classic method. Performance was improved by capitalizing

upon the properties of Newton's method, and the code is once again

structured taking account of the 80386 characteristics.

(4) The trig, log, and exp functions are based in each case upon quasi-

"optimal" polynomial approximations. My definition of "optimal" was

based upon getting good accuracy with reasonable speed.

(5) The argument reducing code for the trig function effectively uses

a value of pi which is accurate to more than 128 bits. As a consequence,

the reduced argument is accurate to more than 64 bits for arguments up

to a few pi, and accurate to more than 64 bits for most arguments,

even for arguments approaching 2^63. This is far superior to an

80486, which uses a value of pi which is accurate to 66 bits.

The code of the emulator is complicated slightly by the need to

account for a limited form of re-entrancy. Normally, the emulator will

emulate each FPU instruction to completion without interruption.

However, it may happen that when the emulator is accessing the user

memory space, swapping may be needed. In this case the emulator may be

temporarily suspended while disk i/o takes place. During this time

another process may use the emulator, thereby changing some static

variables (eg FPU_st0_ptr, etc). The code which accesses user memory

is confined to five files:

fpu_entry.c

reg_ld_str.c

load_store.c

get_address.c

errors.c

----------------------- Limitations of wm-FPU-emu -----------------------

There are a number of differences between the current wm-FPU-emu

(version beta 1.11) and the 80486 FPU (apart from bugs). Some of the

more important differences are listed below:

The Roundup flag does not have much meaning for the transcendental

functions and its 80486 value with these functions is likely to differ

from its emulator value.

In a few rare cases the Underflow flag obtained with the emulator will

be different from that obtained with an 80486. This occurs when the

following conditions apply simultaneously:

(a) the operands have a higher precision than the current setting of the

precision control (PC) flags.

(b) the underflow exception is masked.

(c) the magnitude of the exact result (before rounding) is less than 2^-16382.

(d) the magnitude of the final result (after rounding) is exactly 2^-16382.

(e) the magnitude of the exact result would be exactly 2^-16382 if the

operands were rounded to the current precision before the arithmetic

operation was performed.

If all of these apply, the emulator will set the Underflow flag but a real

80486 will not.

NOTE: Certain formats of Extended Real are UNSUPPORTED. They are

unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,

and Unnormals. None of these will be generated by an 80486 or by the

emulator. Do not use them. The emulator treats them differently in

detail from the way an 80486 does.

The emulator treats PseudoDenormals differently from an 80486. These

numbers are in fact properly normalised numbers with the exponent

offset by 1, and the emulator treats them as such. Unlike the 80486,

the emulator does not generate a Denormal Operand exception for these

numbers. The arithmetical results produced when using such a number as

an operand are the same for the emulator and a real 80486 (apart from

any slight precision difference for the transcendental functions).

Neither the emulator nor an 80486 produces one of these numbers as the

result of any arithmetic operation. An 80486 can keep one of these

numbers in an FPU register with its identity as a PseudoDenormal, but

the emulator will not; they are always converted to a valid number.

Self modifying code can cause the emulator to fail. An example of such

code is:

movl %esp,[%ebx]

fld1

The FPU instruction may be (usually will be) loaded into the pre-fetch

queue of the cpu before the mov instruction is executed. If the

destination of the 'movl' overlaps the FPU instruction then the bytes

in the prefetch queue and memory will be inconsistent when the FPU

instruction is executed. The emulator will be invoked but will not be

able to find the instruction which caused the device-not-present

exception. For this case, the emulator cannot emulate the behaviour of

an 80486DX.

Handling of the address size override prefix byte (0x67) has not been

extensively tested yet. A major problem exists because using it in

vm86 mode can cause a general protection fault. Address offsets

greater than 0xffff appear to be illegal in vm86 mode but are quite

acceptable (and work) in real mode. A small test program developed to

check the addressing, and which runs successfully in real mode,

crashes dosemu under Linux and also brings Windows down with a general

protection fault message when run under the MS-DOS prompt of Windows

3.1. (The program simply reads data from a valid address).

----------------------- Performance of wm-FPU-emu -----------------------

Speed.

-----

The speed of floating point computation with the emulator will depend

upon instruction mix. Relative performance is best for the instructions

which require most computation. The simple instructions are adversely

affected by the fpu instruction trap overhead.

Timing: Some simple timing tests have been made on the emulator functions.

The times include load/store instructions. All times are in microseconds

measured on a 33MHz 386 with 64k cache. The Turbo C tests were under

ms-dos, the next two columns are for emulators running with the djgpp

ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,

using libm4.0 (hard).

function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu

+ 60.5 154.8 76.5 139.4

- 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7

* 71.0 190.8 79.6 146.6

/ 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1

sin() 310.8 4692.0 319.0 398.5

cos() 284.4 4855.2 308.0 388.7

tan() 495.0 8807.1 394.9 504.7

atan() 328.9 4866.4 601.1 419.5-491.9

sqrt() 128.7 crashed 145.2 227.0

log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1

exp() 479.1 6619.2 469.1 850.8

The performance under Linux is improved by the use of look-ahead code.

The following results show the improvement which is obtained under

Linux due to the look-ahead code. Also given are the times for the

original Linux emulator with the 4.1 'soft' lib.

[ Linus' note: I changed look-ahead to be the default under linux, as

there was no reason not to use it after I had edited it to be

disabled during tracing ]

wm-FPU-emu w original w

look-ahead 'soft' lib

+ 106.4 190.2

- 108.6-111.6 192.4-216.2

* 113.4 193.1

/ 108.8-124.4 700.1-706.2

sin() 390.5 2642.0

cos() 381.5 2767.4

tan() 496.5 3153.3

atan() 367.2-435.5 2439.4-3396.8

sqrt() 195.1 4732.5

log() 358.0-387.5 3359.2-3390.3

exp() 619.3 4046.4

These figures are now somewhat out-of-date. The emulator has become

progressively slower for most functions as more of the 80486 features

have been implemented.

----------------------- Accuracy of wm-FPU-emu -----------------------

Accuracy: The following table gives the accuracy of the sqrt(), trig

and log functions. Each function was tested at about 400 points. Ideal

results would be 64 bits. The reduced accuracy of cos() and tan() for

arguments greater than pi/4 can be thought of as being due to the

precision of the argument x; e.g. an argument of pi/2-(1e-10) which is

accurate to 64 bits can result in a relative accuracy in cos() of about

64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given

in the last column.

Function Tested x range Worst result Turbo C

(relative bits)

sqrt(x) 1 .. 2 64.1 63.2

atan(x) 1e-10 .. 200 62.6 62.8

cos(x) 0 .. pi/2-(1e-10) 63.2 (x <= pi/4) 62.4

35.2 (x = pi/2-(1e-10)) 31.9

sin(x) 1e-10 .. pi/2 63.0 62.8

tan(x) 1e-10 .. pi/2-(1e-10) 62.4 (x <= pi/4) 62.1

35.2 (x = pi/2-(1e-10)) 31.9

exp(x) 0 .. 1 63.1 62.9

log(x) 1+1e-6 .. 2 62.4 62.1

As of version 1.3 of the emulator, the accuracy of the basic

arithmetic has been improved (by a small fraction of a bit). Care has

been taken to ensure full accuracy of the rounding of the basic

arithmetic functions (+,-,*,/,and fsqrt), and they all now produce

results which are exact to the 64th bit (unless there are any bugs

left). To ensure this, it was necessary to effectively get information

of up to about 128 bits precision. The emulator now passes the

"paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24

bit precision numbers) when precision control is set to 24, 53 or 64

bits, and for 'double' variables (53 bit precision numbers) when

precision control is set to 53 bits (a properly performing FPU cannot

pass the 'paranoia' tests for 'double' variables when precision

control is set to 64 bits).

For version 1.5, the accuracy of fprem and fprem1 has been improved.

These functions now produce exact results. The code for reducing the

argument for the trig functions (fsin, fcos, fptan and fsincos) has

been improved and now effectively uses a value for pi which is

accurate to more than 128 bits precision. As a consquence, the

accuracy of these functions for large arguments has been dramatically

improved (and is now very much better than an 80486 FPU). There is

also now no degradation of accuracy for fcos and ftan for operands

close to pi/2. Measured results are (note that the definition of

accuracy has changed slightly from that used for the above table):

Function Tested x range Worst result

(absolute bits)

cos(x) 0 .. 9.22e+18 62.0

sin(x) 1e-16 .. 9.22e+18 62.1

tan(x) 1e-16 .. 9.22e+18 61.8

It is possible with some effort to find very large arguments which

give much degraded precision. For example, the integer number

8227740058411162616.0

is within about 10e-7 of a multiple of pi. To find the tan (for

example) of this number to 64 bits precision it would be necessary to

have a value of pi which had about 150 bits precision. The FPU

emulator computes the result to about 42.6 bits precision (the correct

result is about -9.739715e-8). On the other hand, an 80486 FPU returns

0.01059, which in relative terms is hopelessly inaccurate.

For arguments close to critical angles (which occur at multiples of

pi/2) the emulator is more accurate than an 80486 FPU. For very large

arguments, the emulator is far more accurate.

------------------------- Contributors -------------------------------

A number of people have contributed to the development of the

emulator, often by just reporting bugs, sometimes with suggested

fixes, and a few kind people have provided me with access in one way

or another to an 80486 machine. Contributors include (to those people

who I may have forgotten, please forgive me):

Linus Torvalds

[email protected]

[email protected]

Nick Holloway, [email protected]

Hermano Moura, [email protected]

Jon Jagger, [email protected]

Lennart Benschop

Brian Gallew, [email protected]

Thomas Staniszewski, [email protected]

Martin Howell, [email protected]

M Saggaf, [email protected]

Peter Barker, [email protected]

[email protected]

Dan Russel, [email protected]

Daniel Carosone, [email protected]

[email protected]

Hamish Coleman, [email protected]

Bruce Evans, [email protected]

Timo Korvola, [email protected]

Rick Lyons, [email protected]

...and numerous others who responded to my request for help with

a real 80486.

--------------------------------------------------------------------------------

README.TXT Release: RSX 5 (c) Rainer Schnitker Dec 1994

--------------------------------------------------------------------------------

This program is free software; you can redistribute it and/or modify

it under the terms of the GNU General Public License version 2 as

published by the Free Software Foundation.

--------------------------------------------------------------------------------

This is RSX, the DPMI-extender for EMX and DJGPP programs.

RSX can run the GCC programs under DPMI-servers like MS-Windows 3.1

and simulates a missing 387/487.

News in release 5:

- 32 bit RSX extender default now

- pipes, software scheduler; 'gcc -pipe' works

- rsx can run Emacs 19.27 (emx)

- new emx syscalls (select, bsd signals, pipe)

- tested with OS/2 Warp; _memaccess() workaround

- int10 vesa support

- bugs fixed

UNZIP RSX-ARCHIVE:

First you should delete older RSX versions.

Unpack this archive with unzip50 or pkunzip2

C:\> UNZIP A:\DPMIGCC5.ZIP

C:\> PKUNZIP -d -) A:\DPMIGCC5.ZIP

Please read INSTALL.TXT in this archive before using this software.

WHY USING RSX FOR EMX PROGRAMS:

-------------------------------

RSX can run unmodified EMX 0.8-0.9a programs under DPMI 0.9/1.0.

The compiler and other EMX programs can run in a MS-Windows or OS/2 DOS-box.

Also RSX supports system calls that are not available under EMX+DOS.

WHY USING RSX FOR DJGPP PROGRAMS:

---------------------------------

RSX can run the DJGPP 1.08-1.11 compiler under DPMI 0.9/1.0.

You don't need a coprocessor, a FPU-emulator will do this job.

RSX support also ptrace(), signal(), wait(), fork(), spawnve(P_DEBUG).

If you use the DJGPP compiler only under DPMI, gcc + rsx is faster than

gcc + go32.

Mail to: [email protected]

Home FTP-server: ftp.uni-bielefeld.de

RSX directory is: /pub/systems/msdos/misc

This server contains also:

RSXWDK 2 (availble in 12/94)

building 32 bit MS-Windows Apps with EMX+GCC and DJGPP

includes the windows extender RSXW32

RSXWIN 2a

running EMX text-mode programs in a MS-Windows 3.1 window

Copy the files

LIBRSX.A -> \djgpp\lib

_PROCESS.H -> \djgpp\include\sys

_USER.H -> \djgpp\include\sys

_PTRACE.H -> \djgpp\include\sys

_REG.H -> \djgpp\include\sys

_SIGNAL.H -> \djgpp\include\sys

+---------------------------------------------------------------------------+

| wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. |

| |

| Copyright (C) 1992,1993,1994 |

| W. Metzenthen, 22 Parker St, Ormond, Vic 3163, |

| Australia. E-mail [email protected] |

| |

| This program is free software; you can redistribute it and/or modify |

| it under the terms of the GNU General Public License version 2 as |

| published by the Free Software Foundation. |

| |

| This program is distributed in the hope that it will be useful, |

| but WITHOUT ANY WARRANTY; without even the implied warranty of |

| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |

| GNU General Public License for more details. |

| |

| You should have received a copy of the GNU General Public License |

| along with this program; if not, write to the Free Software |

| Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |

| |

+---------------------------------------------------------------------------+

wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387

which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was

in turn based upon emu387 which was written by DJ Delorie for djgpp.

The interface to the Linux kernel is based upon the original Linux

math emulator by Linus Torvalds.

My target FPU for wm-FPU-emu is that described in the Intel486

Programmer's Reference Manual (1992 edition). Unfortunately, numerous

facets of the functioning of the FPU are not well covered in the

Reference Manual. The information in the manual has been supplemented

with measurements on real 80486's. Unfortunately, it is simply not

possible to be sure that all of the peculiarities of the 80486 have

been discovered, so there is always likely to be obscure differences

in the detailed behaviour of the emulator and a real 80486.

wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.

See "Limitations" later in this file for a list of some differences.

Please report bugs, etc to me at:

[email protected]

or at:

[email protected]

--Bill Metzenthen

March 1994

----------------------- Internals of wm-FPU-emu -----------------------

Numeric algorithms:

(1) Add, subtract, and multiply. Nothing remarkable in these.

(2) Divide has been tuned to get reasonable performance. The algorithm

is not the obvious one which most people seem to use, but is designed

to take advantage of the characteristics of the 80386. I expect that

it has been invented many times before I discovered it, but I have not

seen it. It is based upon one of those ideas which one carries around

for years without ever bothering to check it out.

(3) The sqrt function has been tuned to get good performance. It is based

upon Newton's classic method. Performance was improved by capitalizing

upon the properties of Newton's method, and the code is once again

structured taking account of the 80386 characteristics.

(4) The trig, log, and exp functions are based in each case upon quasi-

"optimal" polynomial approximations. My definition of "optimal" was

based upon getting good accuracy with reasonable speed.

(5) The argument reducing code for the trig function effectively uses

a value of pi which is accurate to more than 128 bits. As a consequence,

the reduced argument is accurate to more than 64 bits for arguments up

to a few pi, and accurate to more than 64 bits for most arguments,

even for arguments approaching 2^63. This is far superior to an

80486, which uses a value of pi which is accurate to 66 bits.

The code of the emulator is complicated slightly by the need to

account for a limited form of re-entrancy. Normally, the emulator will

emulate each FPU instruction to completion without interruption.

However, it may happen that when the emulator is accessing the user

memory space, swapping may be needed. In this case the emulator may be

temporarily suspended while disk i/o takes place. During this time

another process may use the emulator, thereby changing some static

variables (eg FPU_st0_ptr, etc). The code which accesses user memory

is confined to five files:

fpu_entry.c

reg_ld_str.c

load_store.c

get_address.c

errors.c

----------------------- Limitations of wm-FPU-emu -----------------------

There are a number of differences between the current wm-FPU-emu

(version beta 1.11) and the 80486 FPU (apart from bugs). Some of the

more important differences are listed below:

The Roundup flag does not have much meaning for the transcendental

functions and its 80486 value with these functions is likely to differ

from its emulator value.

In a few rare cases the Underflow flag obtained with the emulator will

be different from that obtained with an 80486. This occurs when the

following conditions apply simultaneously:

(a) the operands have a higher precision than the current setting of the

precision control (PC) flags.

(b) the underflow exception is masked.

(c) the magnitude of the exact result (before rounding) is less than 2^-16382.

(d) the magnitude of the final result (after rounding) is exactly 2^-16382.

(e) the magnitude of the exact result would be exactly 2^-16382 if the

operands were rounded to the current precision before the arithmetic

operation was performed.

If all of these apply, the emulator will set the Underflow flag but a real

80486 will not.

NOTE: Certain formats of Extended Real are UNSUPPORTED. They are

unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,

and Unnormals. None of these will be generated by an 80486 or by the

emulator. Do not use them. The emulator treats them differently in

detail from the way an 80486 does.

The emulator treats PseudoDenormals differently from an 80486. These

numbers are in fact properly normalised numbers with the exponent

offset by 1, and the emulator treats them as such. Unlike the 80486,

the emulator does not generate a Denormal Operand exception for these

numbers. The arithmetical results produced when using such a number as

an operand are the same for the emulator and a real 80486 (apart from

any slight precision difference for the transcendental functions).

Neither the emulator nor an 80486 produces one of these numbers as the

result of any arithmetic operation. An 80486 can keep one of these

numbers in an FPU register with its identity as a PseudoDenormal, but

the emulator will not; they are always converted to a valid number.

Self modifying code can cause the emulator to fail. An example of such

code is:

movl %esp,[%ebx]

fld1

The FPU instruction may be (usually will be) loaded into the pre-fetch

queue of the cpu before the mov instruction is executed. If the

destination of the 'movl' overlaps the FPU instruction then the bytes

in the prefetch queue and memory will be inconsistent when the FPU

instruction is executed. The emulator will be invoked but will not be

able to find the instruction which caused the device-not-present

exception. For this case, the emulator cannot emulate the behaviour of

an 80486DX.

Handling of the address size override prefix byte (0x67) has not been

extensively tested yet. A major problem exists because using it in

vm86 mode can cause a general protection fault. Address offsets

greater than 0xffff appear to be illegal in vm86 mode but are quite

acceptable (and work) in real mode. A small test program developed to

check the addressing, and which runs successfully in real mode,

crashes dosemu under Linux and also brings Windows down with a general

protection fault message when run under the MS-DOS prompt of Windows

3.1. (The program simply reads data from a valid address).

----------------------- Performance of wm-FPU-emu -----------------------

Speed.

-----

The speed of floating point computation with the emulator will depend

upon instruction mix. Relative performance is best for the instructions

which require most computation. The simple instructions are adversely

affected by the fpu instruction trap overhead.

Timing: Some simple timing tests have been made on the emulator functions.

The times include load/store instructions. All times are in microseconds

measured on a 33MHz 386 with 64k cache. The Turbo C tests were under

ms-dos, the next two columns are for emulators running with the djgpp

ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,

using libm4.0 (hard).

function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu

+ 60.5 154.8 76.5 139.4

- 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7

* 71.0 190.8 79.6 146.6

/ 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1

sin() 310.8 4692.0 319.0 398.5

cos() 284.4 4855.2 308.0 388.7

tan() 495.0 8807.1 394.9 504.7

atan() 328.9 4866.4 601.1 419.5-491.9

sqrt() 128.7 crashed 145.2 227.0

log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1

exp() 479.1 6619.2 469.1 850.8

The performance under Linux is improved by the use of look-ahead code.

The following results show the improvement which is obtained under

Linux due to the look-ahead code. Also given are the times for the

original Linux emulator with the 4.1 'soft' lib.

[ Linus' note: I changed look-ahead to be the default under linux, as

there was no reason not to use it after I had edited it to be

disabled during tracing ]

wm-FPU-emu w original w

look-ahead 'soft' lib

+ 106.4 190.2

- 108.6-111.6 192.4-216.2

* 113.4 193.1

/ 108.8-124.4 700.1-706.2

sin() 390.5 2642.0

cos() 381.5 2767.4

tan() 496.5 3153.3

atan() 367.2-435.5 2439.4-3396.8

sqrt() 195.1 4732.5

log() 358.0-387.5 3359.2-3390.3

exp() 619.3 4046.4

These figures are now somewhat out-of-date. The emulator has become

progressively slower for most functions as more of the 80486 features

have been implemented.

----------------------- Accuracy of wm-FPU-emu -----------------------

Accuracy: The following table gives the accuracy of the sqrt(), trig

and log functions. Each function was tested at about 400 points. Ideal

results would be 64 bits. The reduced accuracy of cos() and tan() for

arguments greater than pi/4 can be thought of as being due to the

precision of the argument x; e.g. an argument of pi/2-(1e-10) which is

accurate to 64 bits can result in a relative accuracy in cos() of about

64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given

in the last column.

Function Tested x range Worst result Turbo C

(relative bits)

sqrt(x) 1 .. 2 64.1 63.2

atan(x) 1e-10 .. 200 62.6 62.8

cos(x) 0 .. pi/2-(1e-10) 63.2 (x <= pi/4) 62.4

35.2 (x = pi/2-(1e-10)) 31.9

sin(x) 1e-10 .. pi/2 63.0 62.8

tan(x) 1e-10 .. pi/2-(1e-10) 62.4 (x <= pi/4) 62.1

35.2 (x = pi/2-(1e-10)) 31.9

exp(x) 0 .. 1 63.1 62.9

log(x) 1+1e-6 .. 2 62.4 62.1

As of version 1.3 of the emulator, the accuracy of the basic

arithmetic has been improved (by a small fraction of a bit). Care has

been taken to ensure full accuracy of the rounding of the basic

arithmetic functions (+,-,*,/,and fsqrt), and they all now produce

results which are exact to the 64th bit (unless there are any bugs

left). To ensure this, it was necessary to effectively get information

of up to about 128 bits precision. The emulator now passes the

"paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24

bit precision numbers) when precision control is set to 24, 53 or 64

bits, and for 'double' variables (53 bit precision numbers) when

precision control is set to 53 bits (a properly performing FPU cannot

pass the 'paranoia' tests for 'double' variables when precision

control is set to 64 bits).

For version 1.5, the accuracy of fprem and fprem1 has been improved.

These functions now produce exact results. The code for reducing the

argument for the trig functions (fsin, fcos, fptan and fsincos) has

been improved and now effectively uses a value for pi which is

accurate to more than 128 bits precision. As a consquence, the

accuracy of these functions for large arguments has been dramatically

improved (and is now very much better than an 80486 FPU). There is

also now no degradation of accuracy for fcos and ftan for operands

close to pi/2. Measured results are (note that the definition of

accuracy has changed slightly from that used for the above table):

Function Tested x range Worst result

(absolute bits)

cos(x) 0 .. 9.22e+18 62.0

sin(x) 1e-16 .. 9.22e+18 62.1

tan(x) 1e-16 .. 9.22e+18 61.8

It is possible with some effort to find very large arguments which

give much degraded precision. For example, the integer number

8227740058411162616.0

is within about 10e-7 of a multiple of pi. To find the tan (for

example) of this number to 64 bits precision it would be necessary to

have a value of pi which had about 150 bits precision. The FPU

emulator computes the result to about 42.6 bits precision (the correct

result is about -9.739715e-8). On the other hand, an 80486 FPU returns

0.01059, which in relative terms is hopelessly inaccurate.

For arguments close to critical angles (which occur at multiples of

pi/2) the emulator is more accurate than an 80486 FPU. For very large

arguments, the emulator is far more accurate.

------------------------- Contributors -------------------------------

A number of people have contributed to the development of the

emulator, often by just reporting bugs, sometimes with suggested

fixes, and a few kind people have provided me with access in one way

or another to an 80486 machine. Contributors include (to those people

who I may have forgotten, please forgive me):

Linus Torvalds

[email protected]

[email protected]

Nick Holloway, [email protected]

Hermano Moura, [email protected]

Jon Jagger, [email protected]

Lennart Benschop

Brian Gallew, [email protected]

Thomas Staniszewski, [email protected]

Martin Howell, [email protected]

M Saggaf, [email protected]

Peter Barker, [email protected]

[email protected]

Dan Russel, [email protected]

Daniel Carosone, [email protected]

[email protected]

Hamish Coleman, [email protected]

Bruce Evans, [email protected]

Timo Korvola, [email protected]

Rick Lyons, [email protected]

...and numerous others who responded to my request for help with

a real 80486.

--------------------------------------------------------------------------------

README.TXT Release: RSX 5 (c) Rainer Schnitker Dec 1994

--------------------------------------------------------------------------------

This program is free software; you can redistribute it and/or modify

it under the terms of the GNU General Public License version 2 as

published by the Free Software Foundation.

--------------------------------------------------------------------------------

This is RSX, the DPMI-extender for EMX and DJGPP programs.

RSX can run the GCC programs under DPMI-servers like MS-Windows 3.1

and simulates a missing 387/487.

News in release 5:

- 32 bit RSX extender default now

- pipes, software scheduler; 'gcc -pipe' works

- rsx can run Emacs 19.27 (emx)

- new emx syscalls (select, bsd signals, pipe)

- tested with OS/2 Warp; _memaccess() workaround

- int10 vesa support

- bugs fixed

UNZIP RSX-ARCHIVE:

First you should delete older RSX versions.

Unpack this archive with unzip50 or pkunzip2

C:\> UNZIP A:\DPMIGCC5.ZIP

C:\> PKUNZIP -d -) A:\DPMIGCC5.ZIP

Please read INSTALL.TXT in this archive before using this software.

WHY USING RSX FOR EMX PROGRAMS:

-------------------------------

RSX can run unmodified EMX 0.8-0.9a programs under DPMI 0.9/1.0.

The compiler and other EMX programs can run in a MS-Windows or OS/2 DOS-box.

Also RSX supports system calls that are not available under EMX+DOS.

WHY USING RSX FOR DJGPP PROGRAMS:

---------------------------------

RSX can run the DJGPP 1.08-1.11 compiler under DPMI 0.9/1.0.

You don't need a coprocessor, a FPU-emulator will do this job.

RSX support also ptrace(), signal(), wait(), fork(), spawnve(P_DEBUG).

If you use the DJGPP compiler only under DPMI, gcc + rsx is faster than

gcc + go32.

Mail to: [email protected]

Home FTP-server: ftp.uni-bielefeld.de

RSX directory is: /pub/systems/msdos/misc

This server contains also:

RSXWDK 2 (availble in 12/94)

building 32 bit MS-Windows Apps with EMX+GCC and DJGPP

includes the windows extender RSXW32

RSXWIN 2a

running EMX text-mode programs in a MS-Windows 3.1 window

December 5, 2017
Add comments