Category : Miscellaneous Language Source Code
Archive   : CRCSET11.ZIP
Filename : CRCSET.DOC

 
Output of file : CRCSET.DOC contained in archive : CRCSET11.ZIP



























CRCSET version 1.1
Copyright (c) 1990 by Kevin Dean

Kevin Dean
Fairview Mall P.O. Box 55074
1800 Sheppard Avenue East
Willowdale, Ontario
CANADA M2J 5B9
CompuServe ID: 76336,3114
























Contents
--------

Warranty ............................ 1

License ............................. 2

Introduction ........................ 3

What is a CRC? ...................... 4

How CRCSET Works .................... 8

How to Use CRCSET ................... 13

Vulnerability ....................... 15


Warranty

The author of CRCSET (hereafter referred to as "the author") makes no
warranty of any kind, expressed or implied, including without limitation, any
warranties of merchantability and/or fitness for a particular purpose. The
author shall not be liable for any damages, whether direct, indirect, special,
or consequential arising from a failure of this program to operate in the
manner desired by the user. The author shall not be liable for any damage to
data or property which may be caused directly or indirectly by use of the
program.

In no event will the author be liable to the user for any damages,
including any lost profits, lost savings, or other incidental or consequential
damages arising out of the use or inability to use the program, or for any
claim by any other party.












































- Page 1 -


License

This program is public domain. As such, it may be freely distributed
by anyone by any means. No person or organization may charge for this program
except for a minimal charge to cover handling and distribution.

Having said that, I would also like to add that this algorithm has
taken a lot of time and work to develop. If you would like to make any
contribution for the use of this program, please do so, but it is not
required. I consider anti-virus algorithms too useful to be proprietary. As
for the contributions themselves, they don't have to be money. Send me copies
of your own programs or whatever you feel is appropriate.

Also, if you have any questions or would like to see some more
features in the program, drop me a note by surface or electronic mail (my
address is on the first page of this file). I will answer all mail regarding
this program.

Customization is available.








































- Page 2 -


Introduction

CRCSET is an anti-virus utility. Its purpose is to protect programs
(in which supporting code is linked) with one of the most effective weapons
against computer viruses: the Cyclic Redundancy Check, or CRC. A full
understanding of the CRC is not required to use this utility; if you like, you
can skip over the discussion of the CRC to the section entitled "How to Use
CRCSET".

There are many utilities that perform CRC checks on other programs
(VALIDATE.COM by McAfee associates is one) but most of these are external
programs that are usually run only once, if at all. The CRC generated by
these utilities must be compared to a value in an external file; if the values
match, the program is not infected.

This approach has two problems: the first is that the CRC check is
run only once when the user gets the program, if at all. Most people would
never run the check a second time. The second problem is that the CRC is
stored in an external file (e.g. the documentation). If someone wants to tack
a virus onto the program, it becomes a simple matter to run the validation
program, copy the CRC values to the documentation, and distribute the infected
program. Anyone running the validation program would find the same CRC in the
program as in the documentation, and in comes the virus.

Another (increasingly popular) approach is for the CRC to be stored in
the program itself (the .EXE file) and for the program to do its own check
every time it is loaded. This method is much more effective than the previous
one because it means that the moment the program is infected and the CRC
changes, the infection will be detected the next time the program is run.

There is one big problem with this method, but before I get into that,
we need some background.



























- Page 3 -


What is a CRC?

The CRC, or Cyclic Redundancy Check, is an error-checking algorithm
used in many types of computer operations, especially in data transfer. For
example, whenever your computer writes to disk, the disk controller calculates
the CRC of the data being written and writes it with the data. If your disk
should somehow become corrupted, the CRC check will fail when you next try to
read the data and the disk controller will return with an error, forcing DOS
to display the critical error "Data error reading drive C:". Most file
transfer protocols (like Xmodem, Zmodem, and some derivatives of Kermit) also
use a CRC check to validate the data being transmitted; if the CRC transmitted
with the data does not match the CRC calculated by the receiving program, then
the transmission has failed and the sending program is asked to retry the
transmission.

The actual calculation of the CRC is very simple. The algorithm
consists of two parts, a CRC polynomial and a CRC register, and is really an
exercise in modulo-2 arithmetic. The rules for modulo-2 arithmetic are shown
in the following table:

0 + 0 = 0
0 + 1 = 1
1 + 0 = 1
1 + 1 = 0

For those of you familiar with binary logic, these rules are equivalent to
the exclusive-or operation. Note: under modulo-2 arithmetic, subtraction is
equivalent to addition.

There is nothing magical about modulo-2 arithmetic and it has no
special properties that make it better suited to CRC calculations than
standard arithmetic; rather, since modulo-2 arithmetic doesn't carry from one
column to the next (i.e. 1 + 1 = 0 with no carry), it's just easier to
implement in hardware and software than any other method. Consider the
following:

1. Binary addition, normal rules of carry
101101001
+ 110110111
-----------
1100100000

2. Binary addition, modulo-2 arithmetic (no carry)
101101001
+ 110110111
-----------
011011110

The first addition requires us to carry any overflow from right to left. The
second addition requires no carry operations and can be performed much faster
both by humans and by computers.

The CRC algorithm can best be illustrated by the following diagram of
a 4-bit CRC generator:





- Page 4 -


CRC polynomial
------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
+ <-| x |<- + <-| x |<------| x |<- + <-| x |<- +
^ ----- ----- ----- -----
| CRC register
---- binary data stream

Each '+' symbol represents modulo-2 addition. The numbers above the CRC
register are the bit numbers of the register.

The CRC is calculated as follows:

1. Initialize the CRC register to 0.

2. Add the incoming bit of the data stream to the outgoing bit (bit 3) of the
CRC register.

3. Send the result of step 1 into the polynomial feedback loop.

4. Add the value in the feedback loop to the bits in the CRC register as it is
shifted left. The bits affected are determined by the CRC polynomial (i.e.
there is an addition for every bit in the polynomial that is equal to 1; if
the bit is 0, it is not fed back into the register). In this case, the
polynomial represented is 1011.

5. Repeat steps 2-4 for every bit in the data stream.

6. The CRC is the value in the register.

Let's try this with the data stream 11010111 and the polynomial 1011. The
result will be a 4-bit CRC.

The output stream to the left is the result of each addition operation
at the left-most gate. This is the value that is fed into the polynomial
feedback loop during the left shift.

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
+ <-| 0 |<- + <-| 0 |<------| 0 |<- + <-| 0 |<- +
^ ----- ----- ----- -----
|
---- 11010111

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
1 <- + <-| 1 |<- + <-| 0 |<------| 1 |<- + <-| 1 |<- +
^ ----- ----- ----- -----
|
---- 1010111






- Page 5 -


------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
10 <- + <-| 0 |<- + <-| 1 |<------| 1 |<- + <-| 0 |<- +
^ ----- ----- ----- -----
|
---- 010111

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
100 <- + <-| 1 |<- + <-| 1 |<------| 0 |<- + <-| 0 |<- +
^ ----- ----- ----- -----
|
---- 10111

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
1000 <- + <-| 1 |<- + <-| 0 |<------| 0 |<- + <-| 0 |<- +
^ ----- ----- ----- -----
|
---- 0111

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
10001 <- + <-| 1 |<- + <-| 0 |<------| 1 |<- + <-| 1 |<- +
^ ----- ----- ----- -----
|
---- 111

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
100010 <- + <-| 0 |<- + <-| 1 |<------| 1 |<- + <-| 0 |<- +
^ ----- ----- ----- -----
|
---- 11

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
1000101 <- + <-| 0 |<- + <-| 1 |<------| 1 |<- + <-| 1 |<- +
^ ----- ----- ----- -----
|
---- 1

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
10001011 <- + <-| 0 |<- + <-| 1 |<------| 0 |<- + <-| 1 |<- +
^ ----- ----- ----- -----
|
----

The CRC is 0101.


- Page 6 -


What should be obvious at this point is that if a single bit in the
data stream is changed, the value in the CRC register is corrupted completely.
The feedback loop ensures that the error is propagated throughout the entire
calculation.

Most CRC algorithms use either a 16- or 32-bit polynomial. The longer
the polynomial, the more effective it is at catching errors; a 16-bit CRC, for
example, algorithm catches more than 99.99% of all errors in a data stream.

All other CRC algorithms are analogous to the 4-bit algorithm
presented here. There are some optimizations that can process several bits at
a time; the source code included with this program uses a lookup table that
can process 8 bits at once. For further discussion of the CRC algorithm and
its variations, I recommend "C Programmer's Guide to Serial Communications" by
Joe Campbell, published by Howard W. Sams & Company.












































- Page 7 -


How CRCSET Works

The idea of storing a program's CRC in the executable file itself has
one drawback: since the CRC is part of the program, it becomes part of the
data stream that is used to calculate itself. In other words, you have to
know what the CRC of the program is in order to calculate it! At compile and
link time, this is downright impossible; changing the slightest thing in your
code will change the CRC the next time you recompile.

Most algorithms that store the CRC in the program itself get around
this drawback by breaking up the program into three chunks:

+------------------------+-----+------------------------+
| <-- Program part 1 --> | CRC | <-- Program part 2 --> |
+------------------------+-----+------------------------+

The CRC is then calculated as the concatenation of parts 1 and 2, i.e. when
the CRC is calculated, it skips over itself completely in the calculation.
What it sees is this:

+------------------------+------------------------+
| <-- Program part 1 --> | <-- Program part 2 --> |
+------------------------+------------------------+

In order for a virus to infect any program that uses this method, it
must somehow find the location of the CRC within the file and recalculate the
CRC using the following data stream:

+------------------------+------------------------+---------------+
| <-- Program part 1 --> | <-- Program part 2 --> | <-- Virus --> |
+------------------------+------------------------+---------------+

It must then overwrite the old CRC with the new one.

I won't explain how (I don't want to give any virus-writers any
ideas), but with the right technique the CRC can be found, recalculated, and
rewritten in under 30 seconds.

CRCSET overcomes this limitation by making the polynomial and the CRC
part of the data stream. In order to calculate the CRC, CRCSET looks for a
predefined string in the program (the default is DEAN_CRC), replaces the first
four bytes with a 32-bit polynomial, sets the next four bytes (the true CRC)
to 0, and calculates the intermediate CRC assuming that the true CRC is 0.
Then, through the magic of matrix algebra, CRCSET calculates what the true CRC
should have been in order to yield itself instead of the intermediate CRC at
the end. Let's take a look at a 4-bit CRC calculation as an example.

Let's assume that the polynomial in use is 1011, that the CRC
calculated up to the point where we reach the search string (represented by
the bit pattern STUVWXYZ) is 0010, and that the bit pattern 1100 follows the
search string:








- Page 8 -


------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
+ <-| 0 |<- + <-| 0 |<------| 1 |<- + <-| 0 |<- +
^ ----- ----- ----- -----
|
---- STUVWXYZ1100

1. Replace the first four bits (STUV) with the CRC polynomial (1011):

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
+ <-| 0 |<- + <-| 0 |<------| 1 |<- + <-| 0 |<- +
^ ----- ----- ----- -----
|
---- 1011WXYZ1100

2. Calculate the value of the CRC register with the polynomial in the data
stream:

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
+ <-| 1 |<- + <-| 0 |<------| 0 |<- + <-| 1 |<- +
^ ----- ----- ----- -----
|
---- WXYZ1100

3. Replace the next four bits (WXYZ) with simple variables (X3, X2, X1, X0):

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
+ <-| 1 |<- + <-| 0 |<------| 0 |<- + <-| 1 |<- +
^ ----- ----- ----- -----
|
---- (X3)(X2)(X1)(X0)1100

4. Propagate X3+(bit 3) through the feedback loop:

---------------1-----------0------------1--------------1
| 3 | 2 1 | 0 |
| -------- v ----- ------ v -------- v
+ <-| X3+1 |<- + <-| 0 |<------| X3 |<- + <-| X3+1 |<- +
^ -------- ----- ------ --------
|
---- (X2)(X1)(X0)1100











- Page 9 -


5. Propagate X2+(bit 3) through the feedback loop:

------------------1------------0------------1-----------------1
| 3 | 2 1 | 0 |
| ----------- v ------ ------ v ----------- v
+ <-| X3+X2+1 |<- + <-| X3 |<------| X2 |<- + <-| X3+X2+1 |<- +
^ ----------- ------ ------ -----------
|
---- (X1)(X0)1100

In bit 1, for example, we have (X2+(bit 3))+(bit 0) = (X2+X3+1)+(X3+1) = X2
since the X3 terms cancel, no matter what the value of X3 is.

6. Propagate X1+(bit 3) through the feedback loop:

------------------1------------0------------1--------------------1
| 3 | 2 1 | 0 |
| ----------- v ------ ------ v -------------- v
+ <-| X2+X1+1 |<- + <-| X2 |<------| X1 |<- + <-| X3+X2+X1+1 |<- +
^ ----------- ------ ------ --------------
|
---- (X0)1100

7. Propagate X0+(bit 3) through the feedback loop:

------------------1------------0---------------1--------------------1
| 3 | 2 1 | 0 |
| ----------- v ------ --------- v -------------- v
+ <-| X1+X0+1 |<- + <-| X1 |<------| X3+X0 |<- + <-| X2+X1+X0+1 |<- +
^ ----------- ------ --------- --------------
|
---- 1100

8. Propagate the next bit through the feedback loop:

-------------1---------------0--------------1---------------1
| 3 | 2 1 | 0 |
| ------ v --------- -------- v --------- v
+ <-| X0 |<- + <-| X3+X0 |<------| X2+1 |<- + <-| X1+X0 |<- +
^ ------ --------- -------- ---------
|
---- 100

9. Repeat step 8 for all remaining bits:

---------------------1---------------0--------------1---------------1
| 3 | 2 1 | 0 |
| -------------- v --------- -------- v --------- v
+ <-| X3+X2+X1+1 |<- + <-| X3+X0 |<------| X2+1 |<- + <-| X3+X2 |<- +
^ -------------- --------- -------- ---------
|
----







- Page 10 -


We want the CRC in the register to be equal to the unknown CRC we
started inserting at step 4, i.e. we need:

N Value calculated for bit N Bit N
--- -------------------------- -----
3 X3 + X2 + X1 + 1 = X3
2 X3 + X0 = X2
1 X2 + 1 = X1
0 X3 + X2 = X0

If we collect all the variables on the left and all the constants on the
right (keeping in mind that we are dealing with modulo-2 arithmetic):

X2 + X1 = 1
X3 + X2 + X0 = 0
X2 + X1 = 1
X3 + X2 + X0 = 0

The value 1010 is the intermediate CRC mentioned earlier.

Here we have an interesting situation. The first and third equations
are the same and so are the second and fourth. What we come down to is this:

X2 + X1 = 1
X3 + X2 + X0 = 0

We have four variables and only two equations. There is no unique solution;
in fact, there are four (2 to the power of (4 - number of independent
equations)) separate and distinct sets of values that will satisfy these
equations.

Since CRCSET needs a numeric solution, we have to arbitrarily set bits
to get one. For arguments sake, let's set X2 to 1.

1 + X1 = 1
X3 + 1 + X0 = 0

In other words:

X1 = 0
X3 + + X0 = 1

By setting X2 to 1, we have also fixed X1. Now let's set X0 to 0.

X3 + + 0 = 1

In other words:

X3 = 1

We now have a solution for the CRC of the program: 1100. There are three
others: 0101, 0010, and 1011. If we replace the string WXYZ with any of these
values, the CRC calculation process will yield that value at the end, e.g.:






- Page 11 -


------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
+ <-| 0 |<- + <-| 0 |<------| 1 |<- + <-| 0 |<- +
^ ----- ----- ----- -----
|
---- 101111001100
----

yields

------------1-----------0-----------1-----------1
| 3 | 2 1 | 0 |
| ----- v ----- ----- v ----- v
+ <-| 1 |<- + <-| 1 |<------| 0 |<- + <-| 0 |<- +
^ ----- ----- ----- -----
|
----

If you're not sure about this, try it with pen and paper. Plug in each of the
four values and you should get that same value at the end of the CRC
calculation process. To help you out, here are the intermediate values of the
CRC register for each solution (the first value is the value after step 2 of
the calculation):

CRC
-----
1100: 1001, 0010, 1111, 0101, 1010, 0100, 0011, 0110, 1100
0101: 1001, 1001, 0010, 0100, 0011, 1101, 1010, 1111, 0101
0010: 1001, 1001, 1001, 0010, 0100, 0011, 1101, 0001, 0010
1011: 1001, 0010, 0100, 0011, 1101, 1010, 0100, 1000, 1011

The fact that there is not a unique solution isn't really important;
only about 30% of the time will there be a unique solution. This does not
diminish the effectiveness of the CRC calculation because whichever of the
four values the CRC is set to, any virus installing itself in the program will
still change it. The fact that we did not get a unique solution does mean,
however, that it is possible to get the following situation:

X2 + X1 = 1
X3 + X2 + X0 = 1
X2 + X1 = 1
X3 + X2 + X0 = 0

Here equations 2 and 4 contradict each other. There are no values of X3 to X0
that will satisfy these equations. If the CRCSET program comes across this
situation, it will simply try again.

For illustration, I have used only a 4-bit CRC; the CRCSET algorithm
uses 32 bits. The principle is the same; it just takes more time (and ink,
paper, patience, caffeine, pizza, and chocolate chip cookies).








- Page 12 -


How to Use CRCSET

This is the easy part.

For C programmers, add the files VALIDCRC.C and VIRUSDAT.C to the list
of files required to build the program you are working on (in Turbo C, add
them to the the project file). Add a call to isvalidcrc("PROGNAME.EXE")
somewhere in your program (preferably in main()). The function isvalidcrc()
returns 1 if the CRC is valid or 0 if it is invalid, if the program file is
not found, or if the CRC polynomial has been reset to 0.

For Turbo Pascal programmers, add the unit VirusCRC to your "uses"
clause and call the function IsValidCRC('PROGNAME.EXE'). IsValidCRC returns
TRUE if the CRC is valid or FALSE if it is invalid, if the program file is not
found, or if the CRC polynomial has been reset to 0.

Example programs, TESTCRC.C and TESTCRC.PAS, have been provided.

Once you have compiled your program, you have to calculate its CRC.
The program CRCSET.EXE has been provided for this purpose. The syntax is:

CRCSET [-] file [file [file [...]]]
is an 8-character (0-padded) string in which to store CRC data
(default is DEAN_CRC),
[file] is one or more files for which a CRC is to be calculated.

The string for which CRCSET.EXE searches is stored in the variable _viruscrc
in C and _VirusCRC in Turbo Pascal. The default is DEAN_CRC but you may
change it if there is a conflict (i.e. if there is more than one instance of
DEAN_CRC in the program, CRCSET.EXE will not know which one holds the CRC and
so will not set it). The string is replaced with a randomly-generated
polynomial and the CRC itself.

For example, to set the CRC for either of the test programs, the
command is:

CRCSET TESTCRC.EXE

If you change the string in one of the test programs to something like
"MyName", you would enter:

CRCSET -MyName TESTCRC.EXE

The case of the string on the command line must match exactly the case of the
string in the program. Also, any strings shorter than 8 characters must be
padded with 0's in the program.

If you run TESTCRC before setting the CRC, it will abort with a
warning that it may have been infected. After you set the CRC, run TESTCRC to
assure yourself that its CRC is valid.

If you want to test the reliability of the CRC check, change a few
bytes in TESTCRC.EXE (be careful, changing bytes in the code segment can hang
the program, so try bytes in the data segment, towards the end). Run TESTCRC
again, and it should warn you that it may have been infected.

Despite its complexity, CRCSET.EXE takes only a few seconds to
calculate the CRC of the target file. I have made some optimizations to the

- Page 13 -


algorithm that make the calculation time almost constant regardless of the
size of the file.

Once a CRC has been determined for your program, it takes little time
for the validation function to verify it every time the program is run.

CRCSET will display the name of the file that it is working on and
one or more of the following messages:

CRC search string "" found more than once.

The search string specified occurs more than once in the file.
This is an error, since CRCSET has no way of knowing which occurrence
of the string it is supposed to replace with the polynomial and CRC.
Change the search string in _viruscrc (C version) or _VirusCRC (Turbo
Pascal version), recompile, and try again with the new string as an
argument to CRCSET (e.g. CRCSET -NewStrng PROG.EXE).

CRC search string "" not found.

The search string specified does not occur in the file. Make
sure that the string passed to CRCSET is correct and also that the
validation module is being linked (i.e. that there is a call to the
isvalidcrc() or IsValidCRC somewhere in your program).

Testing polynomial abcdef12 ... no solution.

The matrix generated by CRCSET using the polynomial abcdef12
does not have a solution. CRCSET will try again with another
polynomial.

Testing polynomial abcdef12 ... CRC is 34567890.
It is unique.

The matrix generated by CRCSET using the polynomial abcdef12
has the unique solution 34567890, i.e. 34567890 is the CRC of your
program under the polynomial abcdef12. A unique solution will occur
only about 30% of the time.

Testing polynomial abcdef12 ... CRC is 34567890.
It is not unique (2^N solutions).

The matrix generated by CRCSET using the polynomial abcdef12
has the non-unique solution 34567890, i.e. 34567890 is the CRC of your
program under the polynomial abcdef12. The number of free variables
(the number of bits in the CRC whose value is irrelevant to the
solution) is N; each bit has two possible values, so there are 2^N
possible solutions. It is theoretically possible that all 32 bits
will be free variables, but this is extremely unlikely. The fact that
there is not a unique solution does not diminish the effectiveness of
the CRC validation in any way.

Note: the validation function (both the C and Pascal versions) finds
the name of the running program in _argv[0] (or ParamStr(0)) if the program is
running under DOS versions 3 or greater and searches the DOS PATH for the
program under DOS version 2.



- Page 14 -


Vulnerability

The CRCSET algorithm, like every other anti-virus algorithm, is
vulnerable to attack. Hand-tweaking the code to bypass the virus protection
is always possible. Direct attack to determine the storage location of the
polynomial and the CRC and to change it is also possible, but this can take
upwards of 5-10 minutes on a 386. Any virus that ties up the computer for
that long wins no points for discretion. Any user that doesn't notice a
system lockup lasting over 30 seconds probably has many other doors open for
viruses anyway.

There is no substitute for proper precautions: downloading from a
reputable BBS, avoiding pirated software, scanning programs for viruses before
using them, and so on. This program was developed with the knowledge that
most people don't take these precautions (based on a sample size of at least 1
- me); rather than leave it up to the end user to protect against viruses,
with this we programmers can take on some of the burden by protecting the
programs we write against them.









































- Page 15 -



  3 Responses to “Category : Miscellaneous Language Source Code
Archive   : CRCSET11.ZIP
Filename : CRCSET.DOC

  1. Very nice! Thank you for this wonderful archive. I wonder why I found it only now. Long live the BBS file archives!

  2. This is so awesome! 😀 I’d be cool if you could download an entire archive of this at once, though.

  3. But one thing that puzzles me is the “mtswslnkmcjklsdlsbdmMICROSOFT” string. There is an article about it here. It is definitely worth a read: http://www.os2museum.com/wp/mtswslnk/