Category : Alternate Operating Systems - Quarterdeck DesqView, CP/M, etc
Archive   : QWHITE13.ZIP
Filename : EXCEPT13.TEC

 
Output of file : EXCEPT13.TEC contained in archive : QWHITE13.ZIP

ID:13 QEMM-386: Exception #13 Explained
Quarterdeck Technical Note #142 Filename: EXCEPT13.TEC
by Michael Bolton CompuServe: EXCEPT.TEC
Last revised: 3/02/92 Category: QEMM

Subject: A detailed explanation of what QEMM-386's Exception #12 and Exception
#13 messages mean, why they are reported, and some of the steps that
can be taken to identify their causes. See also EX13FLOW.TEC for
troubleshooting suggestions.


Q. What is an Exception #13? What is an Exception #12?
Q. What does the QEMM Exception message mean? How can it help me?

Users of QEMM-386 may sometimes encounter a report that an attempt has been
made to execute an invalid instruction. It is almost certain that QEMM-386,
in and of itself, is not the cause of Exception #13 problems, though
QEMM-386's memory managment may come into conflict with other hardware and
software on your system.

Quarterdeck Technical Note #232, Exception #13 Advanced Troubleshooting
(EX13FLOW.TEC) is designed to resolve conflicts in which QEMM-386 may be
involved. This technical note is its companion; here we explain in detail
what a processor exception is, how you can interpret the information provided
by the exception report, and what you can do to remedy the situation in the
unhappy event that the techniques in EX13FLOW.TEC don't provide relief from
the problem.

To answer the questions above, it's worthwhile to examine the Exception #13
report bit by bit.

"The processor has notified QEMM that an attempt has been made to execute an
invalid instruction..."

Exceptions are the processor's response to unusual, invalid, or special
conditions in the normal operation of the 80386 processor and others in its
family. (The 80386 family includes the 80386SX, the 80386DX, the 80486SX, and
the 80486DX processors; their memory management architecture is essentially
the same. In this document, the term "386" refers to any and all of these
processors.) Exceptions cause the 386 processor to stop what it's doing and
to try to react to the condition that caused the exception. QEMM-386 is
designed to capture some of these exceptions -- particularly those caused by
protection faults or invalid instructions, which could cause a program or the
entire system to crash -- and display a report to the user. When the
processor encounters an instruction that it does not want to execute, it
passes control to the protected mode interrupt 13 (decimal) handler.
QEMM-386's protected mode INT 13 handler posts the Exception #13 message.
Neither DOS nor Microsoft's EMM386.EXE have a protected mode interrupt 13
handler, so if an exception occurs using only DOS or EMM386.EXE, your system
simply crashes and you have no report.


Q. What causes an Exception #12 or Exception #13?

"...This may be due to an error in one of your programs, a conflict between
two pieces of software, or a conflict between a piece of hardware and a piece
of software...."

The exception reported is most commonly #13, the General Protection Fault
exception. This indicates that a program has tried to execute an invalid or
privileged instruction. On the 386 processor, programs can run at varying
privilege levels, so that the processor can better protect application
programs (which generally run at lower privilege levels) from crashing the
operating system or control program (which typically runs at the highest
privilege level). DOS and QEMM-386 do not enforce this protection, but
QEMM-386 can report when a program running at the lowest privilege level tries
to execute a privileged instruction. The result may be a system crash, but
QEMM-386 does provide a report before the crash happens.

Invalid instructions are harder to classify, for indeed Exception #13 is
something of a catch-all. Some examples of invalid instructions include:

- 386-specific instructions that are disallowed when the processor is in
virtual 8086 mode. The processor is in this mode whenever QEMM-386 is in an
ON state -- essentially when it is providing expanded memory or High RAM.

- A program trying to write data to a segment that has been marked as
executable or read-only (the data could overwrite program code).

- Trying to run program code from a data segment (if data is read as code, it
will be a series of meaningless or nonsensical instructions -- which, if
executed, could jump to invalid addresses or overwrite the operating system)

- Exceeding the limit of a segment. Segments in virtual 8086 mode are not
permitted to exceed FFFFh (65535 decimal) bytes or to fall below 0 bytes.
Neither a program instruction nor a memory reference may span the boundary
of a segment.

It is this last which is the most common; this is a problem also known as
"segment wrap", which we will discuss later. Again, QEMM-386 is designed to
trap and report these errors, but it cannot defend against the system crashes
that they may cause.

Occasionally Exception #12, indicating a stack exception, will be reported.
This is a protection violation very similar to Exception #13, but is one in
which the stack segment is involved in some way. Although no easier to solve,
it is a somewhat less general report than Exception #13.

Very infrequently, an Exception #0 is reported. This is not intentional; it
is usually the result of QEMM's stack being corrupted while QEMM was trying to
report another exception, or is the result of some other system error.

It is important to remember that in the vast majority of cases, QEMM-386 is
not involved with the problem, but is merely reporting it.


Q. What do I do now?

"...It is likely that the system is unstable now and should be rebooted...."

QEMM-386 is designed to offer the user the opportunity to terminate the
offending program, or to reboot the computer, but often the damage has already
been done by the time that the exception is trapped and reported. In this
instance, you may find the computer locked regardless of what you choose. If
the computer is indeed hung, you should write down the information on the
screen and then reboot the machine.

While QEMM-386's exception reports can be cryptic to non-programmers -- or to
programmers who have little experience with assembly language -- the
information that they provide can sometimes be quite helpful. Exception
reports can help you to identify which program has triggered the exception
message, what the invalid instruction was, and the state of the processor's
registers when the error occurred. Armed with this information, you may be
able to help the developer of the offending application to determine the
problem that led to the exception, and thus the developer may be able to
provide a temporary workaround or a permananent fix.

The exception report is divided into three parts --

1) The vector or class of exception, and its location and error code. The
location of the exception indicates the address in memory at which the invalid
instruction was attempted. The program loaded at this address (if indeed a
program is loaded there) should be noted by running Manifest.

Exception #13 at 1B12:0103, error code: 0000

In this example, the program loaded at address 1B12:xxxx is automatically your
suspect. Reboot your system in the same configuration as you had when the
Exception #13 occurred. If the problem happened during an application
program, don't load the application just yet. Load Manifest instead, and have
a look at First Meg / Programs.

Memory Area Size Description
03D1 - 0465 2.3K COMMAND
0466 - 046A 0.1K (04C0)
046B - 0483 0.4K COMMAND Environment
0484 - 0487 0.1K COMMAND Data
0488 - 0498 0.3K DV Environment
0499 - 04BE 0.6K DV
04BF - 1A38 85K DV Data
1A39 - 1A52 0.4K COMMAND Data
1A53 - 1AE7 2.3K COMMAND
1AE8 - 1B00 0.4K COMMAND Environment
1B01 - 7E4F 397K [Available]

The sample Exception #13 above happened in that Available range, so it was the
program that would have been loaded had we not loaded Manifest -- that is, the
application program. If you have a TSR loaded low, and the Exception #13 is
occuring within that TSR's address space, then it is your suspect, rather than
the application. In any case, the program whose code falls into the range in
which the Exception #13 occurred likely has a problem of some type.

2) The second part of the Exception #13 message is the register dump:

AX=0000 BX=0000 CX=0000 DX=0000 SI=FFFF DI=0000 BP=0000
DS=1B12 ES=1B12 SS=1B12 SP=FFFE Flags=7246

The registers are the temporary storage areas on the 80386 chip which are used
for calculations and addressing. Each register is two bytes (16 bits) in
size, so each register is capable of holding a value from 0 to FFFF
(hexadecimal), or from 0 to 65335 (decimal).

If any registers here are 0000 or FFFF, it's possible that you could be
looking at a segment wrap. A segment wrap happens whenever a program attempts
to access -- read from or write to -- something beyond the limit of a segment.
A word value consists of two adjacent bytes; if a word value were to begin at
FFFF (which is the last byte of a segment), the second byte of that value will
be outside the segment -- and an attempt to read from or write to that word
will thus cause a protection violation. Similarly, a doubleword is four
adjacent bytes; if any of the last three bytes are outside of the segment
limit, a segment wrap and a protection violation will occur when an access is
attempted.

On an 8086 processor, it's actually possible for a segment wrap to occur
without a protection violation, simply because the 8086 has no hardware
protection at all. What is the byte after the last byte of a segment? On the
8086, it's the FIRST byte of the same segment. (Non-technical analogy for
poker players: Queen - King - Ace - Two - Three is a straight in the
penny-ante poker game played when the 8086 processor is dealing. The 386
processor is a very strict dealer, and does not permit this.) It is possible
(though unlikely) for a program to continue without a crash on an 8086
processor when two "adjacent" bytes are actually a whole segment apart; it
could theoretically be possible on a 386 too, but the exception is generated
before the memory access can be completed.

This sort of problem is seen most commonly during a string move -- the program
is copying a whole block of data from one range of addresses to another. You
may not understand this, and actually it doesn't matter if you don't.
Briefly, though, SI stands for Source Index; DI stands for Destination Index.
These two registers are used for string instructions -- instructions that load
or copy information sequentially. String instructions are extremely powerful
and useful, since they allow the developer to deal with large amounts of data
in a single pass. A byte or a word value can be fetched from memory by one
string instruction, dealt with, and then the result can be copied to a new
memory location with a second string instruction -- and all this can be
managed with an extremely tight, fast loop. An entire range of addresses (for
example, in screen memory) can even be filled with a given value using a
single instruction. The catch here is that the string instruction is only
valid as long as the value of the SI or DI register does not fall outside the
range addressable by these registers. If either one of these tries to exceed
FFFF (or tries to fall below 0000), as a string is being copied from one
region of memory to another, you'll get a protection violation.

3) Instruction: A5 CC 00 00 00 00 00 00 00 00 00 00 00 00 00
Do you want to (T)erminate the program or (R)eboot?

This is the invalid instruction that the program was trying to execute when
the processor stopped it. Since most humans don't have a hope of interpreting
machine language by looking at the opcodes, you can get a better
interpretation of what is going on by examining this instruction with a
program that can render machine codes into assembly language. (Well... it's
better than nothing.) To do so, go into DEBUG; type DEBUG at the DOS prompt.

Enter the values from the Instruction line by typing

E 100

at DEBUG's hyphen prompt, and then entering each byte (pair of digits) from
the instruction line. Follow each byte with a space.

(As a bonus -- if you're running under DESQview, you can Mark the information
from the Exception #13 report, and Transfer it into DEBUG running in a
different Big DOS window.)

If most of the bytes begin with a 4, 5, 6, or 7, there's a good chance that
you're seeing a program trying to execute text, thinking that text to be code.
This can happen in several circumstances, but frequent offenders are those
programs which load code at the top of conventional memory during boot -- and
therefore during the OPTIMIZE process -- and presume that no program will
allocate that memory. Programs which place parts of themselves at the top of
conventional memory typically do so without protecting themselves from
programs like LOADHI which may need to allocate all conventional memory at
appropriate times; LOADHI (and programs like it) will overwrite the vulnerable
code.

As a real-world example, PROTMAN, a program whose purpose in life is to manage
the loading of various parts of 3Com and MS-LAN networks, did this in past
versions, as explained in Quarterdeck Technical Note #173, PROTMAN.TEC.
During the OPTIMIZE process, LOADHI would allocate all conventional memory
while it was determining the size of the various drivers that were being
loaded. PROTMAN would jump to what it thought was still its own code, but
there would be LOADHI signatures there -- text -- and PROTMAN would crash.

You can see the contents of this string if you Dump the instruction you just
entered; use DEBUG's D instruction to do this.

-d 100

1DC0:0100 4F 41 44 48 49 53 49 47-4E 41 54 55 52 BF 42 87 LOADHISIGNATUREB
1DC0:0110 98 FF 6F E2 E9 FF 00 00-26 21 F1 B3 34 00 AF 1D ..o.....&!..4...
1DC0:0120 01 00 D3 E0 0B E8 59 5F-07 B0 00 AA 5F 9D F8 C3 ......Y_...._...
1DC0:0130 AA 41 FE 06 AD 90 C3 2E-C7 06 CF 88 00 00 2E 89 .A..............

ASCII codes starting with 2 are generally punctuation marks; bytes 30-39
represent numeric digits; 3A-3F are punctuation, 41-5A are capital letters,
61-7A are small letters. Any instruction made up mostly of these numbers is
almost certainly text -- and therefore not executable program code. The
program that is trying to run such an instruction is doing so in error. When
the instructions are NOT mostly in the 40-80 range, you should try to
Unassemble them.

-u 100

20C0:0100 A5 MOVSW
20C0:0101 CC INT 3
20C0:0102 0000 ADD [BX+SI],AL

This is the killer instruction from the example Exception #13 above. It's
performing a MOVSW (MOVe String Word) at a point when the SI register is FFFF,
and that means that it's trying to write a word value to or from the last byte
of a segment, which (as described above) is illegal.

Other invalid instructions are harder for the non-programmers of the world to
interpret. Often the first byte of an invalid instruction is 0F -- which is a
valid protected-mode instruction, but which the processor interprets as an
invalid opcode if the machine is in Virtual 86 mode. Exceptions of this kind
showed up more commonly in the past, with programs that were trying to enter
protected mode without calling the Virtual Control Program Interface. VCPI is
an industry-standard way for protected-mode software to coexist with 386
expanded memory managers such as QEMM-386; all 386 memory managers these days
are VCPI-providers, and almost all protected-mode programs are VCPI users (or
"clients"). Non-VCPI protected-mode programs include some memory- and
hardware-diagnostic programs, and programs that use the DPMI memory management
specification exclusively. Diagnostic programs typically recommend that you
disable all memory-management software during diagnosis. DPMI programs will
typically accept VCPI memory management; those rare programs that do not will
simply refuse to start up under QEMM-386. In such cases, you may install
QDPMI (the Quarterdeck DPMI Host) on your system; QDPMI is available on the
Quarterdeck BBS at (310) 314-3227, Compuserve (!GO QUARTERDECK), or large
local BBS systems.

How can an Exception #13 be fixed? Two Quarterdeck Technical Notes can help
you determine if you can solve the problem yourself. Quarterdeck Technical
Note #241, QEMM-386: General Troubleshooting (TROUBLE.TEC) is a good place to
start. This note describes common problems and possible solutions, and will
help if the cause of the Exception #13 is a memory conflict or bus-mastering
issue. Quarterdeck Technical Note #232, Exception #13 Advanced
Troubleshooting (EX13FLOW.TEC) should help you to determine if there is
anything at all that you can do yourself to fix the problem.

If you follow the instructions in both of these technical notes completely,
and the Exception #13 persists, the prospects for a resolution are bleak,
since the problem is almost certainly a bug in the offending program. If this
is so, unless you can alert the developer of the program (and make him or her
understand all this, which might be another task altogether), you can never
really make the problem go away, although sometimes you may be able to make it
subside.

Changing the location of the offending program in memory will sometimes help.
If you're running under DESQview, and you're sure that you've given the
program enough memory (i.e., all you can give it), try adding 16 to the size
of the script buffer on page 2 of Change a Program. If you're not running
under DESQview, try adding an extra file handle or two. The key here is to
change the location of the program in memory, which can occasionally be enough
to provide temporary relief from the Exception #13.

There is a substantial caveat: You're not fixing the problem by doing this;
you're just making it submerge. There's still probably a bug in the offending
program -- you've just changed it from a bomb to a landmine. If you can
reproduce the problem consistently, you should still contact the publisher of
the application with all of the data from the Exception #13 message, and all
of the data that you can supply about your system and its current
configuration.

With the exception (no pun intended) of the techniques mentioned above and in
EX13FLOW.TEC, non-programmers can do little to fix the root cause or even the
symptoms of Exception #13. If you are unsuccessful in resolving a conflict,
the information provided by the report should be forwarded, along with a
Manifest printout and a complete description of your system, to the developer
of the program that you were running at the time.

************************************************************************
* Trademarks are property of their respective owners. *
*This technical note may be copied and distributed freely as long as it*
*is distributed in its entirety and it is not distributed for profit. *
* Copyright (C) 1993 by Quarterdeck Office Systems *
************************ E N D O F F I L E *************************