ID:PE Parity Errors
Quarterdeck Technical Note #128 Filename: PARITY.TEC
by Bryan Scott CompuServe: PARITY.TEC
Last revised: 1/13/93 Category: HW

Subject: An explanation of what hardware parity errors are, and some
suggestions on how to find and correct the problem.

* Since I installed QEMM on my system I have been getting "parity
errors". What are parity errors and why are they occurring?

A parity error is a Non-Maskable Interrupt error. The memory controller chip
reports a parity error when it reads a byte of data and the 9 bits it used to
encode the byte do not add up to 1 (odd parity). Parity errors are always
hardware-related. Software applications cannot cause parity errors (though an
application may cause one to be detected).

The error may appear since you've installed QEMM-386 because QEMM-386 uses
memory that has probably never been used before, and could be marginally bad
to begin with. However, before we explain why parity errors are always an
indication of hardware problems, let's discuss what a parity error is.

In the digital world, all information is represented by the binary numbers 0
and 1. The binary digit, or bit, is the fundamental building block of digital
information in a computer, and it stores information in two states: off or on
(0 or 1, respectively). One bit can make a big difference. Here's why:

The binary number for the letter U is:

01010101

If you change just the fourth bit over from the left, from one state to
the other, the binary number becomes the letter E.

01000101

Now while there are 8 bits in a byte, your memory controller handles
information 9 bits at a time. This extra bit is called a "parity bit",
and is the computer's way to verify the integrity of your data. Whenever
you write data to memory, the memory controller adds up the number of 1's
in each byte of information then sets the ninth bit to make the sum of
all nine bytes odd. IBM could have chosen to make the sum of the nine
bytes be even (even parity), but they chose to store data in memory with
odd parity and every other PC manufacturer followed suit.

In the example above, the letter U has the binary value of 01010101, which has
4 1s in it, and the letter E is 01000101, which has 3 1s in it. When your PC
reads each byte of data, it sums the 9 bits to make sure the number of 1s in
the byte is still odd. If the state of a single bit gets changed from 1 to 0,
or 0 to 1, the parity of the nine bits becomes even and the memory controller
asserts the NMI (Non-Maskable Interrupt). This signal is put directly on a
pin of the CPU, then the code pointed at by Interrupt 2 posts a Parity Error
message, which warns you that there is a problem with your RAM.

Please note that all of these operations are performed directly by your
computer's hardware, regardless of which operating system (DOS, OS/2,
UNIX) you use, and regardless of which utility programs or application
software you are running. One exception is Macintosh computers, which
use the 8 bit SIMM chips that do not have parity. When errors occur
the system just malfunctions from the invalid data. Also remember that
parity checking will only detect if one bit in a byte gets changed. If
two bits in the same byte get changed it will accurately reflect that
the sum is still odd and errors will not be detected.

* OK, so it is a hardware problem. Which hardware and how can I find it?

Obviously, the first thing to check is the RAM in your system. An easy test
is to disable everything that uses EMS and XMS memory so you can create a
RAMDRIVE the size of all your system memory. Then:

a) Run CHKDSK on the RAMDRIVE, or
b) Copy files to the RAMDRIVE until it is full.

Either way, eventually you will get a parity error or a General Drive Failure
on the RAMDRIVE. The first thing you can do to try to remedy this problem is
to make sure that the RAM chips are seated properly in their sockets. If they
are DRAMs or SIPPs make sure the pins aren't broken off or bent. If they are
SIMMs or the memory is on a card, you may just need to clean the contacts. If
the chips physically check out ok, the chip speeds could be mis-matched with
memory that is too slow for the cpu/memory bus, or a controller chip could be
bad. At this point the only sure way to test this is to swap out the chips
for ones that you know are good.

Parity errors may also be caused by the presence of an autoswitching video
card or one that is using 16-bit ROM access. Your motherboard could be
assigning parity to the address space where your EMS page frame is located.
Also there may be some special features of the computer in the CMOS Setup that
could be causing problems. Try disabling the computer's shadowing of BIOS or
video ROM or turning off memory caching or other features to see if one of
them is involved. This may allow you to pinpoint the cause of the problem.
In all these cases you should refer to the documentation that came with your
hardware product to disable a particular feature.

While there are several diagnostic programs on the market that will test your
memory for errors, they may not duplicate conditions that would cause marginal
memory to fail, and most are not even designed to be run with a memory
manager. When parity errors are encountered, it is time to have the hardware
components of the machine examined.

************************************************************************
* Trademarks are property of their respective owners. *
*This technical note may be copied and distributed freely as long as it*
*is distributed in its entirety and it is not distributed for profit. *
* Copyright (C) 1993 by Quarterdeck Office Systems *
************************ E N D O F F I L E *************************