Welcome to Small-c:PC. Small-c:PC is a compiler that runs
under PC-DOS on the IBM Personal Computer (PC). The source input
to the compiler is written in small-c, a subset of the C
programming language. The compiler outputs symbolic assembly
language code that can be assembled on the PC using the ASM or
MASM assembler programs available from IBM.
The reference manual for C is THE C PROGRAMMING LANGUAGE book
published by Prentice-Hall and authored by Brian W. Kernighan
and Dennis M. Ritchie. The original compiler for Small-c was
written by Ron Cain as a personal project (see Dr. Dobb's
Journal, #45, Volume V, Number 5 for a description of small-c).
A CP/M version of the compiler for the Intel 8080 is being
distributed by The Code Works, 5266 Hollister, Suite 224, Santa
Barbara, California 93111 (805) 683-1585.
After using the CP/M version of the compiler, we decided to
port it over to the IBM PC (so we could take some of our small-c
programs over with us). The conversion effort was guided by a
desire to not alter the personality of the original small-c
compiler. The objective was to minimize the effort required to
convert existing small-c programs to the Small-c:PC environment.
We think we have been successful since our small-c programs have
been converted to the PC with few problems using Small-c:PC.
The compiler was converted by first deciding what the output
should look like and then modifying it to generate code for the
Intel 8088 instead of the 8080. In parallel, the run time
library was rewritten in 8088 assembly language to operate under
PC-DOS. If you have The Code Works run time library for CP/M and
are interested in the differences between CP/M and PC-DOS, you
might take the time to compare our library with theirs.
Most of the compiler conversion problems centered around
ASM's need for memory. The goal was to produce a tool that could
be used on a 64KB PC with two 160KB drives. Since ASM cannot
deal with large programs in a 64 KB configuration, the small-c
compiler was modified to produce code that could be assembled
separately and then put together using LINK. The original
small-c compiler produces one large output file. Small-c:PC can
produce multiple output files (one for each input file). These
files can be assembled separately using ASM and then LINKed
together. This can significantly reduce program development
time, however, since only the modified file need be recompiled in
the event of source code changes. It can then be assembled and
linked with existing object files.
If you have more than 64KB, ASM can assemble larger files.
With sufficient memory, you can work with larger small-c program
- 2 -
files. The memory requirement for using Small-c:PC, however, is
imposed by ASM, not by the Small-c:PC compiler. Small-c:PC will
compile large programs quite nicely in 64KB.
If you examine your distribution copy of the compiler, named
CPCN.C, you will notice that the source code is marked so that it
can be broken into many smaller files. The makers are small-c
comment statements written as:
/* ### cpcn-xx */
Each of these comment statements marks the start of a
separate file. We marked it this way so that users with only
64KB can modify the compiler if they choose to do so. It will be
necessary, however, to insert the proper external declarations
into each file. For those of you with more memory, it is a
simple matter to generate a new version of the compiler after
making any source editing changes.
We have also distributed the source code for FORMAT, a text
processor described in the book SOFTWARE TOOLS by Brian W.
Kernighan and P.J. Plauger and published by Addison-Wesley.
This program is written in small-c. This manual was produced
using it. The file FORMAT.DOC contains a brief description of
the FORMAT program and how to use it.
As a final note before we get into the operational details of
the compiler you should be aware of the fact that it may contain
bugs. We have tested it quite a bit, but you know how those
little rascals can hide. So beware, one may sneak up and bite
you (usually in the wee hours at the worst time). If you find
any of these critters, please write us and describe the problem.
We have priced Small-c:PC to recover our development cost only.
Please don't call us to discuss problems over the phone.
- 3 -
II. OPERATING Small-c:PC
The compiler is initiated by entering CPC in response to the
PC-DOS command prompt A>. The compiler clears the screen, greets
you and asks two questions. The possible answers are contained
in parenthesis following each question. The capitalized response
in the default taken if you press the ENTER key. The first
question asked is:
Should I pause after an error (y,N)?
Answering Y to this question causes the compiler to pause
after displaying an error. This will give you an opportunity to
continue the compilation or not. Moreover, in the event of a lot
of screen activity during a compilation this insures that you
won't miss an error message. The N response causes the compiler
to continue automatically after displaying an error.
The second question asked is:
Do you want the Small-c:PC-text to appear (y,N)?
Answering Y to this question causes the compiler to write the
input source code into the output file(s) as comment statements.
each small-c statement appears with a semicolon as the first
character (to make it a comment to ASM) followed by the assembly
language code generated by the compiler for that statement. This
interleaving of source code and generated code is very useful in
learning how the compiler implements various small-c statements.
Choosing this option causes the output files to be larger,
however. Answering N will cause the compiler to not write the
small-c source to the output file.
The two previous questions are followed by requests for input
and output filenames. There are no default extensions supplied
by the compiler. Each input file generates a separate output
You can break a large small-c program into separate smaller
files and feed these to the compiler. Hopefully ASM will be able
to swallow the resultant output files without running out of
memory. Again, if you have more than 64KB, ASM should be able to
process a large output file. In this case you will not be forced
to divide a large small-c program into multiple files.
The next request by the compiler is?
The small-c source code is contained in the file you name in
response to this question. There is no default extension supplied
- 4 -
by the compiler.
A single function definition cannot be spread out across
multiple input files. This is because the compiler assumes the
output file corresponding to each input file will be separately
assembled. It writes extra assembly language statements into
each output file to support this. A function spread across two
input files may not assemble correctly. Also, due to the way the
compiler handles externals, it is possible that a function name
could be multiply defined and the compiler not detect it. This
can happen if the separate definitions occur in different input
files. In this circumstance, the error will be detected by LINK.
The runtime library (CPCLIB.ASM) is not input to the compiler
as in other incarnations of small-c. Instead, it is input to
LINK as just another object file. LINK will bind all of the
object inputs together to produce an execute (.EXE) file.
If your response to the input filename request is the ENTER
key or a space (as the first character), the compiler terminates
and returns control to PC-DOS. This is the way the compiler is
Following the input filename request is the question:
The assembly language generated by the compiler for the
previous input file is written into the named file. Normally
this file will have the extension .ASM (not supplied
automatically by the compiler) since it will be input to the
assembler. If you press ENTER instead of providing a file name,
the compiler will direct its output to the display. You might
try this initially to get a feel for the code the compiler
Let's consider the interactions to compile a sample program.
Suppose the program is broken into two files names "SAMPLE-1.C"
and "SAMPLE-2.C". You should first format a PC-DOS data disk and
copy over to it the following files.
CPC.EXE [from the Small-c:PC distribution disk]
We assume the following files are on your system disk which is in
ASM.EXE [from your IBM supplied macro assembler disk]
LINK.EXE [from your IBM supplied PC-DOS disk]
- 5 -
Note: You could use MASM instead of ASM.
Get started by entering the following (the disk you made is in
drive B) and drive B is the logged in disk.
B>CPC [invoke the compiler]
* * * Small-C:PC V1.1 * * * [first line of a clear screen]
By Ron Cain, Modified by CAPROCK SYSTEMS for the IBM PC
Distributed by: CAPROCK SYSTEMS, INC>
P.O. Box 13814
Arlington, Texas 76013
PC-DOS Version N: June, 1982
Should I pause after an error (y,N)>? Y [You don't want
to miss any]
Do you want the Small-c:PC-text to appear (y,N)? N [no]
Input filename? SAMPLE-1.C
Output filename? SAMPLE-1.ASM
====== main () [you know when it starts on a new
====== plc() function]
There were 0 errors in compilation.
Input filename? SAMPLE-2.C [the program is stored in two
Output filename? SAMPLE-2.ASM
There were 0 errors in compilation.
Input filename? [press ENTER]
Notice that the two input files could have been processed in
separate executions of the compiler. SAMPLE-2.C contains the
necessary external data declarations to inform the compiler about
referenced data allocated elsewhere.
The output files are assembled next.
- 6 -
Next, we want to produce an execute file. You do this by
executing LINK. Our example assumes LINK inputs as required by
PC-DOS Version 1.1. If you have Version 2.0 your LINK inputs
will be slightly different, but the results should be the same.
The order of the object file names supplied to LINK is
They you are ready to execute the small-c program. This is
accomplished by typing the .EXE file name.
The SAMPLE program provided on the distribution disk types a text
file onto the display. It obtains the file name to operate on
from the command line.
When the compiler detects an error in the small-c program, it
displays a message on the screen. An example would be:
Line 20, main + 0: missing open paren
The error occurred on the 20-th line in the input file. The
function being compiled was "main". The error occurred 0 lines
into the function. the error detected was a "missing open
paren". The hat character (^) shows where the compiler was at
character-wise when it detected the error. The compiler
continues automatically if you answered N to the first question
asked by the compiler (see example above). If you answered Y to
this questions, you will see the following message displayed.
Continue (Y,n,g) ?
Pressing Y (or just ENTER) causes the compiler to continue
processing the source input. If you type N, the compiler
displays the message
and returns to PC-DOS. If you answer G, the compiler continues
- 7 -
processing the source input, but will no longer pause after an
Pressing CTRL+BREAK at any time will abort the compiler and
return you to PC-DOS. If the compiler is terminated by
CTRL+BREAK, no input or output files are closed.
- 8 -
III. USING THE LIBRARY FUNCTIONS
All of the modules whose entry point names began with CC are
used to support the compiler generated code. As a user, you will
probably never use these routines directly. The functions that
start with QZ are user callable. They can be divided into PC-DOS
interface routines and system interfact routines. The PC-DOS
interfact routines generally provide I/O through the operating
system. The disk I/O functions buffer only one 512 byte sector
at a time (each open file has its own sector buffer space,
however). This combined with the fact that the transfer width
between a small-c program and the disk routines is only one byte
causes file I/O to be somewhat slow. Also, the library routines
support only ASCII files. Certain characters are given special
meanings. AS a result, you can not manipulate binary files with
small-c programs. These file types include .OBJ, .EXE and .COM
- 9 -
THE PC-DOS INTERFACE LIBRARY ROUTINES
The following presents examples to illustrate the PC-DOS
interface routines. The small-c declarations are simply
illustrations of what can be done. There are myriad ways to
accomplish the same coding example. The PC-DOS function numbers
mentioned in the descriptions are given in decimal.
1. Read a character from the keyboard.
c = getchar();
Reads a character from the keyboard using PC-DOS function 1.
The character read is echoed back to the display. Extended ASCII
codes will require two calls to this function. A second call is
indicated if the returned character is null. If the character
input is a carriage return, a line feed is also echoed back to
the display. If the character is CTRL-Z, a -1 is returned
2. Write a character to the display.
c = putchar(c);
The character in the low order byte of c is written to the
display using PC-DOS function 2. Refer to appendix G of the
BASIC manual to determine the effect of each possible character
code. If the character passed is a carriage return, a line feed
is also sent to the display. This function returns the character
passed to it.
3. Read a line from the keyboard.
Reads one line of characters into the character array buffer
using PC-DOS function 10 for buffered keyboard input. Editing of
the buffer during character entry is supported by PC-DOS (see
chapter 1 of the DOS manual). A null character is placed at the
end of the line (replaces the usual carriage return at the end of
the line). Note: the buffer is assumed to be at least 80 bytes
- 10 -
4. Print a line on the display.
Each character of the buffer is written to the display using
PC-DOS function 2 (display character). Refer to appendix G of
the BASIC manual to see how the character codes are interpreted.
Characters are sent to the display until a null character is
encountered. The null character is not sent to the display. No
carriage return or line feed is automatically sent to the
5. Open a disk file for processing.
ptr = fopen(name,mode);
The named file is opened for processing using DOS function
15. The name is parsed using DOS function 41 before the open
call. The mode determines how the file is opened. An "r" or "R"
opens it for input and "w" or "W" opens it for output. Notice
that mode is a pointer to a string. The string contains the
character indicating the desired mode. No error checks are made.
The pointer returned is an offset into the library data segment
of an I/O structure. The structure consists of the FCB followed
by the sector buffer (see CPCLIB data segment). This pointer
must be passed to functions getc, putc and fclose described
below. If the open fails, a zero is returned to ptr. The open
can fail for a variety of reasons. No more than four files may
be open at one time. So lack of an available I/O structure can
cause failure. The filename supplied could be in error or not
exist. It could be that the mode indicated is not one of the
four possible characters indicated above. Programming note: to
test if a file exists before opening it for output, first open it
for input. If this open is successful the file exists.
6. Close a disk file.
The file described by the I/O structure indicated by ptr is
closed to further processing. Any unwritten characters in the
sector buffer are written to disk first. No error check is made
on the value in ptr. The function returns a zero if the close
fails. It returns a non-zero value if the close is successful.
Note that files are not automatically closed when the program
7. Read the next character from an opened disk file.
c = getc(ptr);
- 11 -
The next unread character is returned. The ptr is the I/O
structure offset returned by fopen. The file is assumed to be a
text file. When a carriage return is read, the character that
immediately follows the carriage return is presumed to be a line
feed and is discarded automatically. (No check is made to verify
that it was a line feed). When a CTRL-Z or a physical
end-of-file is detected, a -1 is returned. A read error also
returns a -1.
8. Write a character to an opened disk file.
c = putc(c,ptr);
The character is buffered into the sector buffer indicated by
the ptr (see fopen). If the character is a carriage return, a
line feed is automatically buffered. A physical disk write
occurs when the sector buffer is filled. This function returns
the argument character if no error occurs. A -1 is returned if
an error occurs.
9. Call to PC-DOS.
ax = pcdos(ah,dx);
This function calls PC-DOS. The low order byte of the first
argument is placed into the AH register. The second argument is
placed into the DX register. PC-DOS returns a value in the AX
register. This value is stored into the variable ax as
This function is useful for supporting I/O to the printer or
communications device. The following function sends the passed
character to the printer.
- 12 -
THE SYSTEM INTERFACE LIBRARY ROUTINES
Like those used above, the following additional declarations
are made to illustrate usage of the system interface library
routines. These routines generally provide access to the
hardware on the PC or to special software elements of the system.
Some of the declared names refer to 808x registers. When the
name of an 8-bit register appears as an argument in the examples
below, the low order byte of the value passed is copied into the
808x register with the same name to execute the function
indicated. If a 16-bit register is designated, the full 16-bit
argument is loaded into the 808x register with the same name.
1. Send a byte to a physical output port.
The low order byte of the second argument is sent to the
hardware port address indicated by the first argument. No value
is returned. Refer to the PC Technical Reference Manual for a
description of the physical I/O ports on the PC.
2. Input a byte from a physical input port.
val = in808x(port);
An IN instruction is executed using the hardware port address
provided by the argument. The byte read is sign extended and
returned as a 16-bit value. Refer to the PC Technical Reference
Manual for a description of the physical I/O ports on the PC.
3. Display control through the PC rom BIOS.
PC-DOS does not support complete display capabilities as
provided on the PC. This function allows the small-c programmer
control over the display as supported by the rom BIOS routines.
The PC Technical Reference Manual contains a description in the
rom listings of the required parameter values. Certain functions
may not require all of the argument registers. A dummy argument
must be provided, however, since the library routine expects all
of the indicated arguments (it is not function sensitive).
4. Control and I/O through the asynchronous port.
- 13 -
ax = int14(ah,al,dx);
Support of the async adapter through PC-DOS is not complete
(especially on status information). This function allows the
small-c programmer greater control over the comm port. Again,
the ROM listings in the PC Technical Reference Manual contain a
complete description of the parameters for this function.
5. Sound the bell.
This function simply calls PC-DOS to display the bell
6. Clear the display buffer (and hence the display screen).
This is essentially a clear screen function as provided on
many dumb terminals. This function illustrates how the PC
programmer may manipulate the display memory directly to manage
7. Copy code segment prefix into a small-c data array.
The program prefix as described in the DOS manual contains
information that may be useful to the small-c programmer. For an
example, study the sample program provided on the distribution
disk. This function copies all 256 bytes of the prefix into
string. Using appropriate offsets (or subscripts), the contents
of the prefix area can be examined.
8. Exit to PC-DOS.
This is the function to use in exiting a small-c program at a
point other than a normal return from the main() function. The
exit function assumes that the DS and SS registers are unchanged
from their contents at program entry.
- 14 -
IV. ASSEMBLY LANGUAGE INTERFACE
Some remaining portions of this manual are reproduced from
the user manual for the small-c compiler distributed by The Code
Works. Interfacing to assembly language is accomplished in two
ways. As the library routines demonstrate, you can simply code a
module in the code segment CSEG, assemble it and LINK will
resolve the call if the function name is made PUBLIC. You can
build your own assembly language library to LINK with small-c
programs that you write.
The compiler also supports a language construct that permits
in-line assembly language code to be directly inserted into the
generated output file. This language construct is the
#asm...#endasm statements. Like all preprocessor commands, #asm
and #endasm must be entered in lower case. Since it is
considered by the compiler to be a single statement, it may
appear any where a statement is needed. For example,
if(...) #asm...#endasm else ...
Due to the workings of the preprocessor (which must be
suppressed by this construct), the pseudo-op #asm must be the
last item before the carriage return on the end of the line
(since the text between #asm and the carriage return is thrown
away). The parser is free-format (outside of these exceptions).
So the expected format is as follows:
if (...) #asm [nothing following #asm]
A semicolon is not required after the #endasm.
Assembly language code within the #asm...#endasm context can
access all global variables and functions by name. It is up to
the programmer to know the data type of a variable (i.e. whether
to access a byte or a word). Global variables should be accessed
relative to the stack segment as opposed to the data segment. To
store the AX register into the variable named intvar, code
All global variables and function names have a 'QZ' prefix
added by the compiler. This is illustrated above. As another
- 15 -
illustration, to call putchar() in an assembler routine, code
CALL QZPUTCHAR. Since the library is not assembled with the
generated code, it is necessary to tell the assembler that a
library name is external. Insert the statement
in your assembly language code. If putchar() is called by the
small-c code containing your assembler code, then you do not need
to insert the EXTRN statement. The compiler will generate one
for the reference in the small-c code. A similar situation
exists for global data items. For instance, if intvar is not
defined (or referenced) by containing small-c code, it will be
necessary to code
For other illustrations of this, refer to the generated code for
the sample program on the distribution disk to see how the
compiler handles similar references.
External assembly language routines invoked by function calls
from the small-c code have access to all registers. However, the
DS and SS (and naturally CS) must be preserved across the
assembly language code. All other registers can be altered
without restoration. The calling program removes arguments from
the stack upon return. The function should not prune the stack
- 16 -
RUN TIME CODE STRUCTURE AND SEGMENT USAGE
The compiler generates three segments as a result of
processing the user's small-c program. Executable code is placed
in segment CSEG with a class 'code'. Data items are stored in
the segment STACK with the class 'stack'. No information is
stored in generated segment DUMMY. It is produced to avoid a
LINK error message. The run time library makes use of a data
segment DATASEG also in the class 'code'. The LINK program
combines all output files with specified libraries to produce the
executable module. This module, when loaded into memory, has the
segments in class 'code' first followed by the stack segment
whose class is 'stack'. The entry point is CCGO in the run time
library. Routine CCGO loads the stack segment register and sets
the stack pointer SP to the highest possible value. It pushes
information necessary to return to DOS onto the stack, then calls
the user's main() function. The exit() routine is entered either
by a call from the user program or upon a return from main().
The function exit() cleans the stack off up to the information
placed there by CCGO. It then does a long return to DOS.
During execution, the stack is used extensively. Function
arguments are placed onto the stack in their textual order (left
to right). This is illustrated below by the code generated for
the following statement.
Notice that the compiler generated code to clean up the stack.
Local variables are allocated onto the stack. The current value
of SP thus becomes their address. For example, inside a
function, the statement:
generates the code PUSH CX to occupy two bytes on the stack.
References to the value k use the current value of SP. If
another value is defined, such as:
- 17 -
the compiler would generate
to reserve three bytes on the stack. The offset of array is the
current value of SP. So array is at SP+0, array at SP+1,
array at SP+2, and k would now be at SP+3. Thus, assembly
language code in the statement #asm...#endasm cannot access local
variables by name. They can be accessed by knowing how many
intervening bytes have been allocated between the declaration of
the variable and its use. It is worth noting that local
declarations use only as much stack space as required, including
an odd number of bytes. However, function arguments always
consist of two bytes apiece. If a function argument is of type
char (one byte), the it is sign extended to obtain a 2 byte value
to push onto the stack.
- 18 -
Appendix A: Small-c:PC COMPILER SPECIFICATION
The compiler supports the following.
1. Data type declarations can be:
extern char external 8-bits
extern int external 16-bits
extern external 16-bits
A pointer to either of these types is declared by placing an
asterisk "*" before the pointer name. A pointer is a 16-bit
2. Arrays must be single dimension (vector) structures of type
char or int.
"&" address of scalar
"++" increment, either prefix or postfix
"--" decrement, either prefix or postfix
"%" mod, i.e. remainder from division
"|" inclusive or
"^" exclusive or
"&" logical and
"==" test for equality
"!=" test for inequality
"<" test for less than
"<=" test for less or equal
">" test for greater than
">=" test for greater or equal
"<<" arithmetic shift left
">>" arithmetic shift right
- 19 -
quoted string ("sample")
primed string ('a' or '10')
local variable (or pointer)
global (static) variable (or pointer)
4. Program control:
while (expression) statement;
; (null statement)
local and static pointers can contain the
address of "char" or "int" data items.
6. Compiler commands:
#define name string
(preprocessor will replace name by string
throughout the program text)
(Input is suspended from the input filename and
text is read from the file named in the include statement. When
end-of-file is detected, input is resumed from the input
filename. A separate output file is not created for the #include
file. Its output is directed to the currently open output file.)
(see section IV for description)
7. Miscellaneous notes:
Expression evaluation maintains the same hierarchy as standard C.
Function calls are defined as any primary followed by an open
parenthesis. Legal forms include:
NOTE: the various function call forms are not supported in
- 20 -
Pointer arithmetic takes into account the data type the pointer
was declared for (e.g. ptr++ will increment by 2 if declared
Pointers are compared as unsigned 16-bit values.
The generated code is pure. Data is separated from executable
The generated code is reentrant. Since local variables are
allocated on the stack, each new invocation of a function
generates a new copy of local variables.
- 21 -
Appendix B: COMPILER RESTRICTIONS AND LIMITATIONS
The compiler does not support:
1. Structures and unions
2. Multi-dimensional arrays
3. Floating point data
4. Long integers
5. Functions that return anything but "int" values
6. Unary operators "!", "~", "sizeof", casts
7. The operators "&&", "||", "?:", and ","
8. Assignment operators:
+=, -=, *=, /=, %=, >>=, <<=, &=, ^=,