Dec 052017
 
RSX DPMI-extender for EMX and DJGPP apps rel 5.
File DPMIGCC5.ZIP from The Programmer’s Corner in
Category Recently Uploaded Files
RSX DPMI-extender for EMX and DJGPP apps rel 5.
File Name File Size Zip Size Zip Type
RSX 0 0 stored
BIN 0 0 stored
BINDDJ.EXE 10370 7340 deflated
DEB.EXE 69636 31995 deflated
DPMIFUN.EXE 53252 20768 deflated
DPMIINFO.EXE 28676 11222 deflated
EMXLDPMI.EXE 1324 571 deflated
PAKRSX.BAT 69 60 deflated
RSX.EXE 99724 48927 deflated
RSXOPT.EXE 11402 7922 deflated
STUBDJ.EXE 6204 4740 deflated
DOC 0 0 stored
COPYING 18321 6637 deflated
COPYING.RSX 1505 685 deflated
DPMILB11.TXT 15576 4637 deflated
FAQ.TXT 7625 3089 deflated
HISTORY.TXT 4694 2094 deflated
INSTALL.TXT 16154 5982 deflated
KRNLDEB.TXT 1830 973 deflated
README.TXT 2238 1054 deflated
RSX4EMX.TXT 4426 1634 deflated
SOURCE 0 0 stored
ADOSX32.H 499 199 deflated
ASM16 0 0 stored
ADOSX32.ASM 8486 2368 deflated
COPY32.ASM 6412 1136 deflated
DPMI10.ASM 8019 1549 deflated
DPMI16.ASM 23238 4346 deflated
EXCEP32.ASM 8811 2514 deflated
FPU.ASM 2051 796 deflated
REGS386.INC 954 227 deflated
SWITCH.ASM 1915 696 deflated
TRANS.INC 245 98 deflated
ASM32 0 0 stored
ADOSX32.S 6880 1552 deflated
COPY32.S 3172 569 deflated
CRT0.S 850 350 deflated
DPMI.C 14920 2354 deflated
DPMI10.S 6894 1479 deflated
EXCEP32.S 7165 2099 deflated
FPU.C 1595 614 deflated
REGS386.INC 1591 398 deflated
BUILD 0 0 stored
BUILD 50 48 deflated
CDOSX32.C 19329 4950 deflated
CDOSX32.H 435 243 deflated
COPY32.H 746 289 deflated
DEB 0 0 stored
ANSI.H 229 123 deflated
BREAKP.C 2234 648 deflated
BREAKP.H 392 193 deflated
COFF.H 9555 3194 deflated
DEB.C 14078 4107 deflated
DPMI.C 7454 2129 deflated
DPMI.H 74 65 deflated
INPUT.C 5104 1476 deflated
MAKEFILE 391 236 deflated
STAB.H 1151 495 deflated
SYMS.C 19085 5114 deflated
SYMS.H 609 269 deflated
UNASSMBL.C 27512 7889 deflated
UNASSMBL.H 208 113 deflated
DJIO.C 14134 4085 deflated
DJIO.H 317 146 deflated
DJLIBRSX 0 0 stored
FORK.S 137 107 deflated
GETPID.S 107 78 deflated
GETPPID.S 118 80 deflated
KILL.S 176 127 deflated
LIBRSX.A 5354 1606 deflated
MAKEFILE 359 177 deflated
PTRACE.S 338 182 deflated
RAISE.S 167 123 deflated
README 297 165 deflated
SIGNAL.S 221 144 deflated
SPAWNVE.C 4533 1108 deflated
WAIT.S 175 126 deflated
_PROCESS.H 1118 294 deflated
_PTRACE.H 533 206 deflated
_REG.H 467 174 deflated
_SIGNAL.H 657 267 deflated
_USER.H 789 263 deflated
DOSERRNO.C 7829 2091 deflated
DOSERRNO.H 831 303 deflated
DPMI 0 0 stored
DPMI.H 8174 2558 deflated
DPMI10.H 2516 780 deflated
DPMITYPE.H 147 70 deflated
DPMIUTIL.C 3520 1104 deflated
DPMIFUN.C 8690 2790 deflated
DPMIFUN.H 7503 1323 deflated
DPMIFUN2.S 399 202 deflated
DPMIINFO.C 4226 1506 deflated
INPUT.C 5104 1476 deflated
MAKEFILE 968 329 deflated
EXCEP32.H 1466 324 deflated
EXTERNA.H 5269 1010 deflated
FPU-EMU 0 0 stored
BUILD 0 0 stored
CONTROL_.H 1815 602 deflated
CRT0FPU.S 3932 1359 deflated
CRT0FPUW.S 1980 722 deflated
DIV_SMAL.S 1492 440 deflated
ERRORS.C 16569 4309 deflated
EXCEPTIO.H 1798 659 deflated
FPU-EMU.RSP 565 160 deflated
FPU_ARIT.C 3698 569 deflated
FPU_ASM.H 934 314 deflated
FPU_AUX.C 3862 1233 deflated
FPU_EMU.H 5824 1985 deflated
FPU_ENTR.C 19417 5375 deflated
FPU_ETC.C 2998 865 deflated
FPU_PROT.H 3973 844 deflated
FPU_SYST.H 2659 1026 deflated
FPU_TRIG.C 40832 7657 deflated
GET_ADDR.C 8015 1952 deflated
INCLUDE 0 0 stored
ASM 0 0 stored
SEGMENT.H 4775 1146 deflated
LINUX 0 0 stored
CONFIG.H 806 241 deflated
KERNEL.H 338 208 deflated
LINKAGE.H 140 84 deflated
MATH_EMU.H 748 339 deflated
SCHED.H 636 273 deflated
SEGMENT.H 150 82 deflated
SIGNAL.H 1384 588 deflated
STDDEF.H 243 149 deflated
LOAD_STO.C 7284 1771 deflated
MAKEFILE 1475 478 deflated
POLYNOMI.S 3736 1034 deflated
POLY_2XM.C 2811 1058 deflated
POLY_ATA.C 6012 1950 deflated
POLY_DIV.S 2304 487 deflated
POLY_L2.C 7961 2429 deflated
POLY_MUL.S 1739 484 deflated
POLY_SIN.C 4580 1480 deflated
POLY_TAN.C 5023 1370 deflated
PRINTK.C 6890 2095 deflated
PRINTK.H 4218 539 deflated
README 16031 6103 deflated
REG_ADD_.C 7711 1447 deflated
REG_COMP.C 8040 1603 deflated
REG_CONS.C 3397 913 deflated
REG_CONS.H 1167 285 deflated
REG_DIV.S 5464 1539 deflated
REG_LD_S.C 37024 7255 deflated
REG_MUL.C 3266 989 deflated
REG_NORM.S 3232 801 deflated
REG_ROUN.S 17675 4273 deflated
REG_U_AD.S 4180 1412 deflated
REG_U_DI.S 12190 2973 deflated
REG_U_MU.S 3764 1146 deflated
REG_U_SU.S 6387 1888 deflated
STATUS_W.H 2483 840 deflated
VERIFY.C 71 66 deflated
VERSION.H 844 208 deflated
WM_SHRX.S 6156 1241 deflated
WM_SQRT.S 10954 2923 deflated
FPU.H 371 202 deflated
FS.C 13439 3126 deflated
FS.H 2610 863 deflated
GNUAOUT.H 2359 836 deflated
INDENTC.BAT 40 40 stored
KDEB.C 9728 2490 deflated
KDEB.H 318 179 deflated
LIBC.C 10763 2478 deflated
LOADER 0 0 stored
CRT1.ASM 1135 486 deflated
LOAD2.ASM 6822 1900 deflated
LOADER.C 11767 3611 deflated
LOADER.EXE 4488 2489 deflated
LOADER.H 3939 1345 deflated
LOADER.MAP 538 236 deflated
MAKEFILE 522 264 deflated
SLIBCE.LIB 2575 1159 deflated
LOADPRG.C 17104 4752 deflated
LOADPRG.H 342 198 deflated
MAKEFILE 2333 826 deflated
MAKEFILE.MSC 1852 673 deflated
MAKEFILE.WAT 1895 684 deflated
OFILES.RSP 394 139 deflated
PRINTF.C 7801 2295 deflated
PRINTF.H 162 112 deflated
PROCESS.C 22635 6249 deflated
PROCESS.H 6867 2042 deflated
PROCESS.S 25127 4873 deflated
PTRACE.C 4270 1239 deflated
PTRACE.H 455 191 deflated
RMLIB.C 11171 2256 deflated
RMLIB.H 3214 970 deflated
RSX.C 10370 3356 deflated
RSX.H 792 322 deflated
RSXDEB.BAT 309 150 deflated
SAMPLE 0 0 stored
ALLOC.C 7354 2072 deflated
EXEC.C 2363 954 deflated
FORK.C 2840 1077 deflated
FPE.C 466 188 deflated
PIPE.C 1707 663 deflated
SIGNALS.C 16771 4602 deflated
SIGNALS.H 1170 512 deflated
SIGNALS3.C 16888 4705 deflated
START32.C 12856 3375 deflated
START32.H 893 374 deflated
STATEMX.C 3776 1144 deflated
STATEMX.H 836 351 deflated
STUB 0 0 stored
BINDDJ.C 2457 830 deflated
EMXLDPMI.ASM 10045 3727 deflated
EMXLDPMI.EXE 1324 571 deflated
EMXLDPMI.MAK 182 93 deflated
RSXOPT.C 5586 1678 deflated
STUBDJ.C 1685 762 deflated
SYSDEP.C 5080 1842 deflated
SYSDEP.H 293 167 deflated
SYSDEP2.C 1376 552 deflated
SYSDJ.C 14885 4367 deflated
SYSEMX.C 26297 7144 deflated
TERMIO.C 12799 3601 deflated
TERMIO.H 3660 1011 deflated
TIMEDOS.C 3799 1095 deflated
TIMEDOS.H 851 322 deflated
VERSION.H 284 165 deflated
TPCREAD.ME 199 165 deflated

Download File DPMIGCC5.ZIP Here

Contents of the README file


The Library LIBRSX.A contains signal,spawnve and ptrace stuff for DJGPP.

Copy the files
LIBRSX.A -> \djgpp\lib

_PROCESS.H -> \djgpp\include\sys
_USER.H -> \djgpp\include\sys
_PTRACE.H -> \djgpp\include\sys
_REG.H -> \djgpp\include\sys
_SIGNAL.H -> \djgpp\include\sys

+---------------------------------------------------------------------------+
| wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. |
| |
| Copyright (C) 1992,1993,1994 |
| W. Metzenthen, 22 Parker St, Ormond, Vic 3163, |
| Australia. E-mail [email protected] |
| |
| This program is free software; you can redistribute it and/or modify |
| it under the terms of the GNU General Public License version 2 as |
| published by the Free Software Foundation. |
| |
| This program is distributed in the hope that it will be useful, |
| but WITHOUT ANY WARRANTY; without even the implied warranty of |
| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| GNU General Public License for more details. |
| |
| You should have received a copy of the GNU General Public License |
| along with this program; if not, write to the Free Software |
| Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |
| |
+---------------------------------------------------------------------------+



wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was
in turn based upon emu387 which was written by DJ Delorie for djgpp.
The interface to the Linux kernel is based upon the original Linux
math emulator by Linus Torvalds.

My target FPU for wm-FPU-emu is that described in the Intel486
Programmer's Reference Manual (1992 edition). Unfortunately, numerous
facets of the functioning of the FPU are not well covered in the
Reference Manual. The information in the manual has been supplemented
with measurements on real 80486's. Unfortunately, it is simply not
possible to be sure that all of the peculiarities of the 80486 have
been discovered, so there is always likely to be obscure differences
in the detailed behaviour of the emulator and a real 80486.

wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.
See "Limitations" later in this file for a list of some differences.

Please report bugs, etc to me at:
[email protected]
or at:
[email protected]


--Bill Metzenthen
March 1994


----------------------- Internals of wm-FPU-emu -----------------------

Numeric algorithms:
(1) Add, subtract, and multiply. Nothing remarkable in these.
(2) Divide has been tuned to get reasonable performance. The algorithm
is not the obvious one which most people seem to use, but is designed
to take advantage of the characteristics of the 80386. I expect that
it has been invented many times before I discovered it, but I have not
seen it. It is based upon one of those ideas which one carries around
for years without ever bothering to check it out.
(3) The sqrt function has been tuned to get good performance. It is based
upon Newton's classic method. Performance was improved by capitalizing
upon the properties of Newton's method, and the code is once again
structured taking account of the 80386 characteristics.
(4) The trig, log, and exp functions are based in each case upon quasi-
"optimal" polynomial approximations. My definition of "optimal" was
based upon getting good accuracy with reasonable speed.
(5) The argument reducing code for the trig function effectively uses
a value of pi which is accurate to more than 128 bits. As a consequence,
the reduced argument is accurate to more than 64 bits for arguments up
to a few pi, and accurate to more than 64 bits for most arguments,
even for arguments approaching 2^63. This is far superior to an
80486, which uses a value of pi which is accurate to 66 bits.

The code of the emulator is complicated slightly by the need to
account for a limited form of re-entrancy. Normally, the emulator will
emulate each FPU instruction to completion without interruption.
However, it may happen that when the emulator is accessing the user
memory space, swapping may be needed. In this case the emulator may be
temporarily suspended while disk i/o takes place. During this time
another process may use the emulator, thereby changing some static
variables (eg FPU_st0_ptr, etc). The code which accesses user memory
is confined to five files:
fpu_entry.c
reg_ld_str.c
load_store.c
get_address.c
errors.c

----------------------- Limitations of wm-FPU-emu -----------------------

There are a number of differences between the current wm-FPU-emu
(version beta 1.11) and the 80486 FPU (apart from bugs). Some of the
more important differences are listed below:

The Roundup flag does not have much meaning for the transcendental
functions and its 80486 value with these functions is likely to differ
from its emulator value.

In a few rare cases the Underflow flag obtained with the emulator will
be different from that obtained with an 80486. This occurs when the
following conditions apply simultaneously:
(a) the operands have a higher precision than the current setting of the
precision control (PC) flags.
(b) the underflow exception is masked.
(c) the magnitude of the exact result (before rounding) is less than 2^-16382.
(d) the magnitude of the final result (after rounding) is exactly 2^-16382.
(e) the magnitude of the exact result would be exactly 2^-16382 if the
operands were rounded to the current precision before the arithmetic
operation was performed.
If all of these apply, the emulator will set the Underflow flag but a real
80486 will not.

NOTE: Certain formats of Extended Real are UNSUPPORTED. They are
unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,
and Unnormals. None of these will be generated by an 80486 or by the
emulator. Do not use them. The emulator treats them differently in
detail from the way an 80486 does.

The emulator treats PseudoDenormals differently from an 80486. These
numbers are in fact properly normalised numbers with the exponent
offset by 1, and the emulator treats them as such. Unlike the 80486,
the emulator does not generate a Denormal Operand exception for these
numbers. The arithmetical results produced when using such a number as
an operand are the same for the emulator and a real 80486 (apart from
any slight precision difference for the transcendental functions).
Neither the emulator nor an 80486 produces one of these numbers as the
result of any arithmetic operation. An 80486 can keep one of these
numbers in an FPU register with its identity as a PseudoDenormal, but
the emulator will not; they are always converted to a valid number.

Self modifying code can cause the emulator to fail. An example of such
code is:
movl %esp,[%ebx]
fld1
The FPU instruction may be (usually will be) loaded into the pre-fetch
queue of the cpu before the mov instruction is executed. If the
destination of the 'movl' overlaps the FPU instruction then the bytes
in the prefetch queue and memory will be inconsistent when the FPU
instruction is executed. The emulator will be invoked but will not be
able to find the instruction which caused the device-not-present
exception. For this case, the emulator cannot emulate the behaviour of
an 80486DX.

Handling of the address size override prefix byte (0x67) has not been
extensively tested yet. A major problem exists because using it in
vm86 mode can cause a general protection fault. Address offsets
greater than 0xffff appear to be illegal in vm86 mode but are quite
acceptable (and work) in real mode. A small test program developed to
check the addressing, and which runs successfully in real mode,
crashes dosemu under Linux and also brings Windows down with a general
protection fault message when run under the MS-DOS prompt of Windows
3.1. (The program simply reads data from a valid address).


----------------------- Performance of wm-FPU-emu -----------------------

Speed.
-----

The speed of floating point computation with the emulator will depend
upon instruction mix. Relative performance is best for the instructions
which require most computation. The simple instructions are adversely
affected by the fpu instruction trap overhead.


Timing: Some simple timing tests have been made on the emulator functions.
The times include load/store instructions. All times are in microseconds
measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
ms-dos, the next two columns are for emulators running with the djgpp
ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
using libm4.0 (hard).

function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu

+ 60.5 154.8 76.5 139.4
- 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7
* 71.0 190.8 79.6 146.6
/ 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1

sin() 310.8 4692.0 319.0 398.5
cos() 284.4 4855.2 308.0 388.7
tan() 495.0 8807.1 394.9 504.7
atan() 328.9 4866.4 601.1 419.5-491.9

sqrt() 128.7 crashed 145.2 227.0
log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1
exp() 479.1 6619.2 469.1 850.8


The performance under Linux is improved by the use of look-ahead code.
The following results show the improvement which is obtained under
Linux due to the look-ahead code. Also given are the times for the
original Linux emulator with the 4.1 'soft' lib.

[ Linus' note: I changed look-ahead to be the default under linux, as
there was no reason not to use it after I had edited it to be
disabled during tracing ]

wm-FPU-emu w original w
look-ahead 'soft' lib
+ 106.4 190.2
- 108.6-111.6 192.4-216.2
* 113.4 193.1
/ 108.8-124.4 700.1-706.2

sin() 390.5 2642.0
cos() 381.5 2767.4
tan() 496.5 3153.3
atan() 367.2-435.5 2439.4-3396.8

sqrt() 195.1 4732.5
log() 358.0-387.5 3359.2-3390.3
exp() 619.3 4046.4


These figures are now somewhat out-of-date. The emulator has become
progressively slower for most functions as more of the 80486 features
have been implemented.


----------------------- Accuracy of wm-FPU-emu -----------------------


Accuracy: The following table gives the accuracy of the sqrt(), trig
and log functions. Each function was tested at about 400 points. Ideal
results would be 64 bits. The reduced accuracy of cos() and tan() for
arguments greater than pi/4 can be thought of as being due to the
precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
accurate to 64 bits can result in a relative accuracy in cos() of about
64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given
in the last column.


Function Tested x range Worst result Turbo C
(relative bits)

sqrt(x) 1 .. 2 64.1 63.2
atan(x) 1e-10 .. 200 62.6 62.8
cos(x) 0 .. pi/2-(1e-10) 63.2 (x <= pi/4) 62.4
35.2 (x = pi/2-(1e-10)) 31.9
sin(x) 1e-10 .. pi/2 63.0 62.8
tan(x) 1e-10 .. pi/2-(1e-10) 62.4 (x <= pi/4) 62.1
35.2 (x = pi/2-(1e-10)) 31.9
exp(x) 0 .. 1 63.1 62.9
log(x) 1+1e-6 .. 2 62.4 62.1


As of version 1.3 of the emulator, the accuracy of the basic
arithmetic has been improved (by a small fraction of a bit). Care has
been taken to ensure full accuracy of the rounding of the basic
arithmetic functions (+,-,*,/,and fsqrt), and they all now produce
results which are exact to the 64th bit (unless there are any bugs
left). To ensure this, it was necessary to effectively get information
of up to about 128 bits precision. The emulator now passes the
"paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24
bit precision numbers) when precision control is set to 24, 53 or 64
bits, and for 'double' variables (53 bit precision numbers) when
precision control is set to 53 bits (a properly performing FPU cannot
pass the 'paranoia' tests for 'double' variables when precision
control is set to 64 bits).

For version 1.5, the accuracy of fprem and fprem1 has been improved.
These functions now produce exact results. The code for reducing the
argument for the trig functions (fsin, fcos, fptan and fsincos) has
been improved and now effectively uses a value for pi which is
accurate to more than 128 bits precision. As a consquence, the
accuracy of these functions for large arguments has been dramatically
improved (and is now very much better than an 80486 FPU). There is
also now no degradation of accuracy for fcos and ftan for operands
close to pi/2. Measured results are (note that the definition of
accuracy has changed slightly from that used for the above table):

Function Tested x range Worst result
(absolute bits)

cos(x) 0 .. 9.22e+18 62.0
sin(x) 1e-16 .. 9.22e+18 62.1
tan(x) 1e-16 .. 9.22e+18 61.8

It is possible with some effort to find very large arguments which
give much degraded precision. For example, the integer number
8227740058411162616.0
is within about 10e-7 of a multiple of pi. To find the tan (for
example) of this number to 64 bits precision it would be necessary to
have a value of pi which had about 150 bits precision. The FPU
emulator computes the result to about 42.6 bits precision (the correct
result is about -9.739715e-8). On the other hand, an 80486 FPU returns
0.01059, which in relative terms is hopelessly inaccurate.

For arguments close to critical angles (which occur at multiples of
pi/2) the emulator is more accurate than an 80486 FPU. For very large
arguments, the emulator is far more accurate.

------------------------- Contributors -------------------------------

A number of people have contributed to the development of the
emulator, often by just reporting bugs, sometimes with suggested
fixes, and a few kind people have provided me with access in one way
or another to an 80486 machine. Contributors include (to those people
who I may have forgotten, please forgive me):

Linus Torvalds
[email protected]
[email protected]
Nick Holloway, [email protected]
Hermano Moura, [email protected]
Jon Jagger, [email protected]
Lennart Benschop
Brian Gallew, [email protected]
Thomas Staniszewski, [email protected]
Martin Howell, [email protected]
M Saggaf, [email protected]
Peter Barker, [email protected]
[email protected]
Dan Russel, [email protected]
Daniel Carosone, [email protected]
[email protected]
Hamish Coleman, [email protected]
Bruce Evans, [email protected]
Timo Korvola, [email protected]
Rick Lyons, [email protected]

...and numerous others who responded to my request for help with
a real 80486.



Contents of the README.TXT file


The Library LIBRSX.A contains signal,spawnve and ptrace stuff for DJGPP.

Copy the files
LIBRSX.A -> \djgpp\lib

_PROCESS.H -> \djgpp\include\sys
_USER.H -> \djgpp\include\sys
_PTRACE.H -> \djgpp\include\sys
_REG.H -> \djgpp\include\sys
_SIGNAL.H -> \djgpp\include\sys

+---------------------------------------------------------------------------+
| wm-FPU-emu an FPU emulator for 80386 and 80486SX microprocessors. |
| |
| Copyright (C) 1992,1993,1994 |
| W. Metzenthen, 22 Parker St, Ormond, Vic 3163, |
| Australia. E-mail [email protected] |
| |
| This program is free software; you can redistribute it and/or modify |
| it under the terms of the GNU General Public License version 2 as |
| published by the Free Software Foundation. |
| |
| This program is distributed in the hope that it will be useful, |
| but WITHOUT ANY WARRANTY; without even the implied warranty of |
| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| GNU General Public License for more details. |
| |
| You should have received a copy of the GNU General Public License |
| along with this program; if not, write to the Free Software |
| Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. |
| |
+---------------------------------------------------------------------------+



wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was
in turn based upon emu387 which was written by DJ Delorie for djgpp.
The interface to the Linux kernel is based upon the original Linux
math emulator by Linus Torvalds.

My target FPU for wm-FPU-emu is that described in the Intel486
Programmer's Reference Manual (1992 edition). Unfortunately, numerous
facets of the functioning of the FPU are not well covered in the
Reference Manual. The information in the manual has been supplemented
with measurements on real 80486's. Unfortunately, it is simply not
possible to be sure that all of the peculiarities of the 80486 have
been discovered, so there is always likely to be obscure differences
in the detailed behaviour of the emulator and a real 80486.

wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.
See "Limitations" later in this file for a list of some differences.

Please report bugs, etc to me at:
[email protected]
or at:
[email protected]


--Bill Metzenthen
March 1994


----------------------- Internals of wm-FPU-emu -----------------------

Numeric algorithms:
(1) Add, subtract, and multiply. Nothing remarkable in these.
(2) Divide has been tuned to get reasonable performance. The algorithm
is not the obvious one which most people seem to use, but is designed
to take advantage of the characteristics of the 80386. I expect that
it has been invented many times before I discovered it, but I have not
seen it. It is based upon one of those ideas which one carries around
for years without ever bothering to check it out.
(3) The sqrt function has been tuned to get good performance. It is based
upon Newton's classic method. Performance was improved by capitalizing
upon the properties of Newton's method, and the code is once again
structured taking account of the 80386 characteristics.
(4) The trig, log, and exp functions are based in each case upon quasi-
"optimal" polynomial approximations. My definition of "optimal" was
based upon getting good accuracy with reasonable speed.
(5) The argument reducing code for the trig function effectively uses
a value of pi which is accurate to more than 128 bits. As a consequence,
the reduced argument is accurate to more than 64 bits for arguments up
to a few pi, and accurate to more than 64 bits for most arguments,
even for arguments approaching 2^63. This is far superior to an
80486, which uses a value of pi which is accurate to 66 bits.

The code of the emulator is complicated slightly by the need to
account for a limited form of re-entrancy. Normally, the emulator will
emulate each FPU instruction to completion without interruption.
However, it may happen that when the emulator is accessing the user
memory space, swapping may be needed. In this case the emulator may be
temporarily suspended while disk i/o takes place. During this time
another process may use the emulator, thereby changing some static
variables (eg FPU_st0_ptr, etc). The code which accesses user memory
is confined to five files:
fpu_entry.c
reg_ld_str.c
load_store.c
get_address.c
errors.c

----------------------- Limitations of wm-FPU-emu -----------------------

There are a number of differences between the current wm-FPU-emu
(version beta 1.11) and the 80486 FPU (apart from bugs). Some of the
more important differences are listed below:

The Roundup flag does not have much meaning for the transcendental
functions and its 80486 value with these functions is likely to differ
from its emulator value.

In a few rare cases the Underflow flag obtained with the emulator will
be different from that obtained with an 80486. This occurs when the
following conditions apply simultaneously:
(a) the operands have a higher precision than the current setting of the
precision control (PC) flags.
(b) the underflow exception is masked.
(c) the magnitude of the exact result (before rounding) is less than 2^-16382.
(d) the magnitude of the final result (after rounding) is exactly 2^-16382.
(e) the magnitude of the exact result would be exactly 2^-16382 if the
operands were rounded to the current precision before the arithmetic
operation was performed.
If all of these apply, the emulator will set the Underflow flag but a real
80486 will not.

NOTE: Certain formats of Extended Real are UNSUPPORTED. They are
unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,
and Unnormals. None of these will be generated by an 80486 or by the
emulator. Do not use them. The emulator treats them differently in
detail from the way an 80486 does.

The emulator treats PseudoDenormals differently from an 80486. These
numbers are in fact properly normalised numbers with the exponent
offset by 1, and the emulator treats them as such. Unlike the 80486,
the emulator does not generate a Denormal Operand exception for these
numbers. The arithmetical results produced when using such a number as
an operand are the same for the emulator and a real 80486 (apart from
any slight precision difference for the transcendental functions).
Neither the emulator nor an 80486 produces one of these numbers as the
result of any arithmetic operation. An 80486 can keep one of these
numbers in an FPU register with its identity as a PseudoDenormal, but
the emulator will not; they are always converted to a valid number.

Self modifying code can cause the emulator to fail. An example of such
code is:
movl %esp,[%ebx]
fld1
The FPU instruction may be (usually will be) loaded into the pre-fetch
queue of the cpu before the mov instruction is executed. If the
destination of the 'movl' overlaps the FPU instruction then the bytes
in the prefetch queue and memory will be inconsistent when the FPU
instruction is executed. The emulator will be invoked but will not be
able to find the instruction which caused the device-not-present
exception. For this case, the emulator cannot emulate the behaviour of
an 80486DX.

Handling of the address size override prefix byte (0x67) has not been
extensively tested yet. A major problem exists because using it in
vm86 mode can cause a general protection fault. Address offsets
greater than 0xffff appear to be illegal in vm86 mode but are quite
acceptable (and work) in real mode. A small test program developed to
check the addressing, and which runs successfully in real mode,
crashes dosemu under Linux and also brings Windows down with a general
protection fault message when run under the MS-DOS prompt of Windows
3.1. (The program simply reads data from a valid address).


----------------------- Performance of wm-FPU-emu -----------------------

Speed.
-----

The speed of floating point computation with the emulator will depend
upon instruction mix. Relative performance is best for the instructions
which require most computation. The simple instructions are adversely
affected by the fpu instruction trap overhead.


Timing: Some simple timing tests have been made on the emulator functions.
The times include load/store instructions. All times are in microseconds
measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
ms-dos, the next two columns are for emulators running with the djgpp
ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
using libm4.0 (hard).

function Turbo C djgpp 1.06 WM-emu387 wm-FPU-emu

+ 60.5 154.8 76.5 139.4
- 61.1-65.5 157.3-160.8 76.2-79.5 142.9-144.7
* 71.0 190.8 79.6 146.6
/ 61.2-75.0 261.4-266.9 75.3-91.6 142.2-158.1

sin() 310.8 4692.0 319.0 398.5
cos() 284.4 4855.2 308.0 388.7
tan() 495.0 8807.1 394.9 504.7
atan() 328.9 4866.4 601.1 419.5-491.9

sqrt() 128.7 crashed 145.2 227.0
log() 413.1-419.1 5103.4-5354.21 254.7-282.2 409.4-437.1
exp() 479.1 6619.2 469.1 850.8


The performance under Linux is improved by the use of look-ahead code.
The following results show the improvement which is obtained under
Linux due to the look-ahead code. Also given are the times for the
original Linux emulator with the 4.1 'soft' lib.

[ Linus' note: I changed look-ahead to be the default under linux, as
there was no reason not to use it after I had edited it to be
disabled during tracing ]

wm-FPU-emu w original w
look-ahead 'soft' lib
+ 106.4 190.2
- 108.6-111.6 192.4-216.2
* 113.4 193.1
/ 108.8-124.4 700.1-706.2

sin() 390.5 2642.0
cos() 381.5 2767.4
tan() 496.5 3153.3
atan() 367.2-435.5 2439.4-3396.8

sqrt() 195.1 4732.5
log() 358.0-387.5 3359.2-3390.3
exp() 619.3 4046.4


These figures are now somewhat out-of-date. The emulator has become
progressively slower for most functions as more of the 80486 features
have been implemented.


----------------------- Accuracy of wm-FPU-emu -----------------------


Accuracy: The following table gives the accuracy of the sqrt(), trig
and log functions. Each function was tested at about 400 points. Ideal
results would be 64 bits. The reduced accuracy of cos() and tan() for
arguments greater than pi/4 can be thought of as being due to the
precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
accurate to 64 bits can result in a relative accuracy in cos() of about
64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given
in the last column.


Function Tested x range Worst result Turbo C
(relative bits)

sqrt(x) 1 .. 2 64.1 63.2
atan(x) 1e-10 .. 200 62.6 62.8
cos(x) 0 .. pi/2-(1e-10) 63.2 (x <= pi/4) 62.4
35.2 (x = pi/2-(1e-10)) 31.9
sin(x) 1e-10 .. pi/2 63.0 62.8
tan(x) 1e-10 .. pi/2-(1e-10) 62.4 (x <= pi/4) 62.1
35.2 (x = pi/2-(1e-10)) 31.9
exp(x) 0 .. 1 63.1 62.9
log(x) 1+1e-6 .. 2 62.4 62.1


As of version 1.3 of the emulator, the accuracy of the basic
arithmetic has been improved (by a small fraction of a bit). Care has
been taken to ensure full accuracy of the rounding of the basic
arithmetic functions (+,-,*,/,and fsqrt), and they all now produce
results which are exact to the 64th bit (unless there are any bugs
left). To ensure this, it was necessary to effectively get information
of up to about 128 bits precision. The emulator now passes the
"paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24
bit precision numbers) when precision control is set to 24, 53 or 64
bits, and for 'double' variables (53 bit precision numbers) when
precision control is set to 53 bits (a properly performing FPU cannot
pass the 'paranoia' tests for 'double' variables when precision
control is set to 64 bits).

For version 1.5, the accuracy of fprem and fprem1 has been improved.
These functions now produce exact results. The code for reducing the
argument for the trig functions (fsin, fcos, fptan and fsincos) has
been improved and now effectively uses a value for pi which is
accurate to more than 128 bits precision. As a consquence, the
accuracy of these functions for large arguments has been dramatically
improved (and is now very much better than an 80486 FPU). There is
also now no degradation of accuracy for fcos and ftan for operands
close to pi/2. Measured results are (note that the definition of
accuracy has changed slightly from that used for the above table):

Function Tested x range Worst result
(absolute bits)

cos(x) 0 .. 9.22e+18 62.0
sin(x) 1e-16 .. 9.22e+18 62.1
tan(x) 1e-16 .. 9.22e+18 61.8

It is possible with some effort to find very large arguments which
give much degraded precision. For example, the integer number
8227740058411162616.0
is within about 10e-7 of a multiple of pi. To find the tan (for
example) of this number to 64 bits precision it would be necessary to
have a value of pi which had about 150 bits precision. The FPU
emulator computes the result to about 42.6 bits precision (the correct
result is about -9.739715e-8). On the other hand, an 80486 FPU returns
0.01059, which in relative terms is hopelessly inaccurate.

For arguments close to critical angles (which occur at multiples of
pi/2) the emulator is more accurate than an 80486 FPU. For very large
arguments, the emulator is far more accurate.

------------------------- Contributors -------------------------------

A number of people have contributed to the development of the
emulator, often by just reporting bugs, sometimes with suggested
fixes, and a few kind people have provided me with access in one way
or another to an 80486 machine. Contributors include (to those people
who I may have forgotten, please forgive me):

Linus Torvalds
[email protected]
[email protected]
Nick Holloway, [email protected]
Hermano Moura, [email protected]
Jon Jagger, [email protected]scp.ac.uk
Lennart Benschop
Brian Gallew, [email protected]
Thomas Staniszewski, [email protected]
Martin Howell, [email protected]
M Saggaf, [email protected]
Peter Barker, [email protected]
[email protected]
Dan Russel, [email protected]
Daniel Carosone, [email protected]
[email protected]
Hamish Coleman, [email protected]
Bruce Evans, [email protected]
Timo Korvola, [email protected]
Rick Lyons, [email protected]

...and numerous others who responded to my request for help with
a real 80486.

--------------------------------------------------------------------------------
README.TXT Release: RSX 5 (c) Rainer Schnitker Dec 1994
--------------------------------------------------------------------------------
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License version 2 as
published by the Free Software Foundation.
--------------------------------------------------------------------------------

This is RSX, the DPMI-extender for EMX and DJGPP programs.
RSX can run the GCC programs under DPMI-servers like MS-Windows 3.1
and simulates a missing 387/487.

News in release 5:
- 32 bit RSX extender default now
- pipes, software scheduler; 'gcc -pipe' works
- rsx can run Emacs 19.27 (emx)
- new emx syscalls (select, bsd signals, pipe)
- tested with OS/2 Warp; _memaccess() workaround
- int10 vesa support
- bugs fixed


UNZIP RSX-ARCHIVE:
First you should delete older RSX versions.
Unpack this archive with unzip50 or pkunzip2

C:\> UNZIP A:\DPMIGCC5.ZIP
C:\> PKUNZIP -d -) A:\DPMIGCC5.ZIP


Please read INSTALL.TXT in this archive before using this software.


WHY USING RSX FOR EMX PROGRAMS:
-------------------------------
RSX can run unmodified EMX 0.8-0.9a programs under DPMI 0.9/1.0.
The compiler and other EMX programs can run in a MS-Windows or OS/2 DOS-box.
Also RSX supports system calls that are not available under EMX+DOS.


WHY USING RSX FOR DJGPP PROGRAMS:
---------------------------------
RSX can run the DJGPP 1.08-1.11 compiler under DPMI 0.9/1.0.
You don't need a coprocessor, a FPU-emulator will do this job.
RSX support also ptrace(), signal(), wait(), fork(), spawnve(P_DEBUG).
If you use the DJGPP compiler only under DPMI, gcc + rsx is faster than
gcc + go32.



Mail to: [email protected]


Home FTP-server: ftp.uni-bielefeld.de
RSX directory is: /pub/systems/msdos/misc


This server contains also:
RSXWDK 2 (availble in 12/94)
building 32 bit MS-Windows Apps with EMX+GCC and DJGPP
includes the windows extender RSXW32
RSXWIN 2a
running EMX text-mode programs in a MS-Windows 3.1 window



 December 5, 2017  Add comments

Leave a Reply