Category : Word Processors
Archive   : NGPERL10.ZIP
Filename : LANGUAGE.DAT

 
Output of file : LANGUAGE.DAT contained in archive : NGPERL10.ZIP

!short: Intro to Perl

perl [options] filename args

Perl is an interpreted language optimized for scanning arbi-
trary text files, extracting information from those text
files, and printing reports based on that information. It's
also a good language for many system management tasks. The
language is intended to be practical (easy to use, effi-
cient, complete) rather than beautiful (tiny, elegant,
minimal). It combines (in the author's opinion, anyway)
some of the best features of C, sed, awk, and sh, so people
familiar with those languages should have little difficulty
with it. (Language historians will also note some vestiges
of csh, Pascal, and even BASIC-PLUS.) Expression syntax
corresponds quite closely to C expression syntax. Unlike
most Unix utilities, perl does not arbitrarily limit the
size of your data--if you've got the memory, perl can slurp
in your whole file as a single string. Recursion is of
unlimited depth. And the hash tables used by associative
arrays grow as necessary to prevent degraded performance.
Perl uses sophisticated pattern matching techniques to scan
large amounts of data very quickly. Although optimized for
scanning text, perl can also deal with binary data, and can
make dbm files look like associative arrays (where dbm is
available). Setuid perl scripts are safer than C programs
through a dataflow tracing mechanism which prevents many
stupid security holes. If you have a problem that would
ordinarily use sed or awk or sh, but it exceeds their capa-
bilities or must run a little faster, and you don't want to
write the silly thing in C, then perl may be for you. There
are also translators to turn your sed and awk scripts into
perl scripts. OK, enough hype.

This manual applies only to the MS-DOS version. Unix-only
features are mentioned but are not fully documented here.
The MS-DOS version of perl attempts to duplicate the Unix
version's functionality but is crippled by MS-DOS and by the
severe memory limitations MS-DOS imposes. The MS-DOS ver-
sion is nevertheless useful for text processing and for lim-
ited applications involving subprocess management.

!short: Upon Startup...

Upon startup, perl looks for your script in one of the fol-
lowing places:

1. Specified line by line via -e switches on the command
line.

2. Contained in the file specified by the first filename on
the command line. (Note that systems supporting the #!
notation invoke interpreters this way.)

3. Passed in implicitly via standard input. This only
works if there are no filename arguments--to pass argu-
ments to a stdin script you must explicitly specify a -
for the script name.

After locating your script, perl compiles it to an internal
form. If the script is syntactically correct, it is exe-
cuted.

!short: Data Types and Objects

Data Types and Objects

Perl has three data types: scalars, arrays of scalars, and
associative arrays of scalars. Normal arrays are indexed by
number, and associative arrays by string.

The interpretation of operations and values in perl some-
times depends on the requirements of the context around the
operation or value. There are three major contexts: string,
numeric and array. Certain operations return array values
in contexts wanting an array, and scalar values otherwise.
(If this is true of an operation it will be mentioned in the
documentation for that operation.) Operations which return
scalars don't care whether the context is looking for a
string or a number, but scalar variables and values are
interpreted as strings or numbers as appropriate to the con-
text. A scalar is interpreted as TRUE in the boolean sense
if it is not the null string or 0. Booleans returned by
operators are 1 for true and 0 or '' (the null string) for
false.

There are actually two varieties of null string: defined and
undefined. Undefined null strings are returned when there
is no real value for something, such as when there was an
error, or at end of file, or when you refer to an uninitial-
ized variable or element of an array. An undefined null
string may become defined the first time you access it, but
prior to that you can use the defined() operator to deter-
mine whether the value is defined or not.

References to scalar variables always begin with '$', even
when referring to a scalar that is part of an array. Thus:

$days # a simple scalar variable
$days[28] # 29th element of array @days
$days{'Feb'} # one value from an associative array
$#days # last index of array @days

but entire arrays or array slices are denoted by '@':

@days # ($days[0], $days[1],... $days[n])
@days[3,4,5] # same as @days[3..5]
@days{'a','c'} # same as ($days{'a'},$days{'c'})

and entire associative arrays are denoted by '%':

%days # (key1, val1, key2, val2 ...)

Any of these eight constructs may serve as an lvalue, that
is, may be assigned to. (It also turns out that an assign-
ment is itself an lvalue in certain contexts--see examples
under s, tr and chop.) Assignment to a scalar evaluates the
righthand side in a scalar context, while assignment to an
array or array slice evaluates the righthand side in an
array context.

You may find the length of array @days by evaluating
"$#days", as in csh. (Actually, it's not the length of the
array, it's the subscript of the last element, since there
is (ordinarily) a 0th element.) Assigning to $#days changes
the length of the array. Shortening an array by this method
does not actually destroy any values. Lengthening an array
that was previously shortened recovers the values that were
in those elements. You can also gain some measure of effi-
ciency by preextending an array that is going to get big.
(You can also extend an array by assigning to an element
that is off the end of the array. This differs from assign-
ing to $#whatever in that intervening values are set to null
rather than recovered.) You can truncate an array down to
nothing by assigning the null list () to it. The following
are exactly equivalent

@whatever = ();
$#whatever = $[ - 1;


If you evaluate an array in a scalar context, it returns the
length of the array. The following is always true:

@whatever == $#whatever - $[ + 1;


Multi-dimensional arrays are not directly supported, but see
the discussion of the $; variable later for a means of emu-
lating multiple subscripts with an associative array. You
could also write a subroutine to turn multiple subscripts
into a single subscript.

Every data type has its own namespace. You can, without
fear of conflict, use the same name for a scalar variable,
an array, an associative array, a filehandle, a subroutine
name, and/or a label. Since variable and array references
always start with '$', '@', or '%', the "reserved" words
aren't in fact reserved with respect to variable names.
(They ARE reserved with respect to labels and filehandles,
however, which don't have an initial special character.
Hint: you could say open(LOG,'logfile') rather than
open(log,'logfile'). Using uppercase filehandles also
improves readability and protects you from conflict with
future reserved words.) Case IS significant--"FOO", "Foo"
and "foo" are all different names. Names which start with a
letter may also contain digits and underscores. Names which
do not start with a letter are limited to one character,
e.g. "$%" or "$$". (Most of the one character names have a
predefined significance to perl. More later.)


You can also embed newlines directly in your strings, i.e.
they can end on a different line than they begin. This is
nice, but if you forget your trailing quote, the error will
not be reported until perl finds another line containing the
quote character, which may be much further on in the script.
Variable substitution inside strings is limited to scalar
variables, normal array values, and array slices. (In other
words, identifiers beginning with $ or @, followed by an
optional bracketed expression as a subscript.) The follow-
ing code segment prints out "The price is $100."

$Price = '$100'; # not interpreted
print "The price is $Price.\n";# interpreted

Note that you can put curly brackets around the identifier
to delimit it from following alphanumerics. Also note that
a single quoted string must be separated from a preceding
word by a space, since single quote is a valid character in
an identifier (see Packages).

Two special literals are __LINE__ and __FILE__, which
represent the current line number and filename at that point
in your program. They may only be used as separate tokens;
they will not be interpolated into strings. In addition,
the token __END__ may be used to indicate the logical end of
the script before the actual end of file. Any following
text is ignored (but may be read via the DATA filehandle).
The two control characters ^^D and ^^Z are synonyms for
__END__.

A word that doesn't have any other interpretation in the
grammar will be treated as if it had single quotes around
it. For this purpose, a word consists only of alphanumeric
characters and underline, and must start with an alphabetic
character. As with filehandles and labels, a bare word that
consists entirely of lowercase letters risks conflict with
future reserved words, and if you use the -w switch, Perl
will warn you about any such words.

Array values are interpolated into double-quoted strings by
joining all the elements of the array with the delimiter
specified in the $" variable, space by default. (Since in
versions of perl prior to 3.0 the @ character was not a
metacharacter in double-quoted strings, the interpolation of
@array, $array[EXPR], @array[LIST], $array{EXPR}, or
@array{LIST} only happens if array is referenced elsewhere
in the program or is predefined.) The following are
equivalent:

$temp = join($",@ARGV);
system "echo $temp";

system "echo @ARGV";

Within search patterns (which also undergo double-quotish
substitution) there is a bad ambiguity: Is /$foo[bar]/ to
be interpreted as /${foo}[bar]/ (where [bar] is a character
class for the regular expression) or as /${foo[bar]}/ (where
[bar] is the subscript to array @foo)? If @foo doesn't oth-
erwise exist, then it's obviously a character class. If
@foo exists, perl takes a good guess about [bar], and is
almost always right. If it does guess wrong, or if you're
just plain paranoid, you can force the correct interpreta-
tion with curly brackets as above.

A line-oriented form of quoting is based on the shell here-
is syntax. Following a << you specify a string to terminate
the quoted material, and all lines following the current
line down to the terminating string are the value of the
item. The terminating string may be either an identifier (a
word), or some quoted text. If quoted, the type of quotes
you use determines the treatment of the text, just as in
regular quoting. An unquoted identifier works like double
quotes. There must be no space between the << and the iden-
tifier. (If you put a space it will be treated as a null
identifier, which is valid, and matches the first blank
line--see Merry Christmas example below.) The terminating
string must appear by itself (unquoted and with no surround-
ing whitespace) on the terminating line.

print < The price is $Price.
EOF

print <<"EOF"; # same as above
The price is $Price.
EOF

print << x 10; # null identifier is delimiter
Merry Christmas!

print <<`EOC`; # execute commands
echo hi there
echo lo there
EOC

print < I said foo.
foo
I said bar.
bar

Array literals are denoted by separating individual values
by commas, and enclosing the list in parentheses:

(LIST)

In a context not requiring an array value, the value of the
array literal is the value of the final element, as in the C
comma operator. For example,

@foo = ('cc', '-E', $bar);

assigns the entire array value to array foo, but

$foo = ('cc', '-E', $bar);

assigns the value of variable bar to variable foo. Note
that the value of an actual array in a scalar context is the
length of the array; the following assigns to $foo the value
3:

@foo = ('cc', '-E', $bar);
$foo = @foo; # $foo gets 3

You may have an optional comma before the closing
parenthesis of an array literal, so that you can say:

@foo = (
1,
2,
3,
);

When a LIST is evaluated, each element of the list is
evaluated in an array context, and the resulting array value
is interpolated into LIST just as if each individual element
were a member of LIST. Thus arrays lose their identity in a
LIST--the list

(@foo,@bar,&SomeSub)

contains all the elements of @foo followed by all the ele-
ments of @bar, followed by all the elements returned by the
subroutine named SomeSub.

A list value may also be subscripted like a normal array.
Examples:

$time = (stat($file))[8]; # stat returns array value
$digit = ('a','b','c','d','e','f')[$digit-10];
return (pop(@foo),pop(@foo))[0];


Array lists may be assigned to if and only if each element
of the list is an lvalue:

($a, $b, $c) = (1, 2, 3);

($map{'red'}, $map{'blue'}, $map{'green'}) = (0x00f, 0x0f0, 0xf00);

The final element may be an array or an associative array:

($a, $b, @rest) = split;
local($a, $b, %rest) = @_;

You can actually put an array anywhere in the list, but the
first array in the list will soak up all the values, and
anything after it will get a null value. This may be useful
in a local().

An associative array literal contains pairs of values to be
interpreted as a key and a value:

# same as map assignment above
%map = ('red',0x00f,'blue',0x0f0,'green',0xf00);

Array assignment in a scalar context returns the number of
elements produced by the expression on the right side of the
assignment:

$x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2


There are several other pseudo-literals that you should know
about. If a string is enclosed by backticks (grave
accents), it first undergoes variable substitution just like
a double quoted string. It is then interpreted as a com-
mand, and the output of that command is the value of the
pseudo-literal, like in a shell. In a scalar context, a
single string consisting of all the output is returned. In
an array context, an array of values is returned, one for
each line of output. (You can set $/ to use a different
line terminator.) The command is executed each time the
pseudo-literal is evaluated. The status value of the com-
mand is returned in $? (see Predefined Names for the
interpretation of $?). Unlike in csh, no translation is
done on the return data--newlines remain newlines. Unlike
in any of the shells, single quotes do not hide variable
names in the command from interpretation. To pass a $
through to the shell you need to hide it with a backslash.

Evaluating a filehandle in angle brackets yields the next
line from that file (newline included, so it's never false
until EOF, at which time an undefined value is returned).
Ordinarily you must assign that value to a variable, but
there is one situation where an automatic assignment hap-
pens. If (and only if) the input symbol is the only thing
inside the conditional of a while loop, the value is
automatically assigned to the variable "$_". (This may seem
like an odd thing to you, but you'll use the construct in
almost every perl script you write.) Anyway, the following
lines are equivalent to each other:

while ($_ = ) { print; }
while () { print; }
for (;😉 { print; }
print while $_ = ;
print while ;

The filehandles STDIN, STDOUT and STDERR are predefined.
(The filehandles stdin, stdout and stderr will also work
except in packages, where they would be interpreted as local
identifiers rather than global.) Additional filehandles may
be created with the open function.

If a is used in a context that is looking for
an array, an array consisting of all the input lines is
returned, one line per array element. It's easy to make a
LARGE data space this way, so use with care.

The null filehandle <> is special and can be used to emulate
the behavior of sed and awk. Input from <> comes either
from standard input, or from each file listed on the command
line. Here's how it works: the first time <> is evaluated,
the ARGV array is checked, and if it is null, $ARGV[0] is
set to '-', which when opened gives you standard input. The
ARGV array is then processed as a list of filenames. The
loop

while (<>) {
... # code for each line
}

is equivalent to

unshift(@ARGV, '-') if $#ARGV < $[;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while () {
... # code for each line
}
}

except that it isn't as cumbersome to say. It really does
shift array ARGV and put the current filename into variable
ARGV. It also uses filehandle ARGV internally. You can
modify @ARGV before the first <> as long as you leave the
first filename at the beginning of the array. Line numbers
($.) continue as if the input was one big happy file. (But
see example under eof for how to reset line numbers on each
file.)

If you want to set @ARGV to your own list of files, go right
ahead. If you want to pass switches into your script, you
can put a loop on the front like this:

while ($_ = $ARGV[0], /^^-/) {
shift;
last if /^^--$/;
/^^-D(.*)/ && ($debug = $1);
/^^-v/ && $verbose++;
... # other switches
}
while (<>) {
... # code for each line
}

The <> symbol will return FALSE only once. If you call it
again after this it will assume you are processing another
@ARGV list, and if you haven't set @ARGV, will input from
STDIN.

If the string inside the angle brackets is a reference to a
scalar variable (e.g. <$foo>), then that variable contains
the name of the filehandle to input from.

If the string inside angle brackets is not a filehandle, it
is interpreted as a filename pattern to be globbed, and
either an array of filenames or the next filename in the
list is returned, depending on context. One level of $
interpretation is done first, but you can't say <$foo>
because that's an indirect filehandle as explained in the
previous paragraph. You could insert curly brackets to
force interpretation as a filename glob: <${foo}>. Example:

while (<*.c>) {
chmod 0644, $_;
}

is equivalent to

open(foo, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
while () {
chop;
chmod 0644, $_;
}

In fact, it's currently implemented that way. (Which means
it will not work on filenames with spaces in them unless you
have /bin/csh on your machine.) Of course, the shortest way
to do the above is:

chmod 0644, <*.c>;

!short: Syntax

Syntax

A perl script consists of a sequence of declarations and
commands. The only things that need to be declared in perl
are report formats and subroutines. See the sections below
for more information on those declarations. All uninitial-
ized user-created objects are assumed to start with a null
or 0 value until they are defined by some explicit operation
such as assignment. The sequence of commands is executed
just once, unlike in sed and awk scripts, where the sequence
of commands is executed for each input line. While this
means that you must explicitly loop over the lines of your
input file (or files), it also means you have much more con-
trol over which files and which lines you look at. (Actu-
ally, I'm lying--it is possible to do an implicit loop with
either the -n or -p switch.)

A declaration can be put anywhere a command can, but has no
effect on the execution of the primary sequence of
commands--declarations all take effect at compile time.
Typically all the declarations are put at the beginning or
the end of the script.

Perl is, for the most part, a free-form language. (The only
exception to this is format declarations, for fairly obvious
reasons.) Comments are indicated by the # character, and
extend to the end of the line. If you attempt to use /* */
C comments, it will be interpreted either as division or
pattern matching, depending on the context. So don't do
that.

!short: Compound statements

Compound statements

In perl, a sequence of commands may be treated as one com-
mand by enclosing it in curly brackets. We will call this a
BLOCK.

The following compound commands may be used to control flow:

if (EXPR) BLOCK
if (EXPR) BLOCK else BLOCK
if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
LABEL while (EXPR) BLOCK
LABEL while (EXPR) BLOCK continue BLOCK
LABEL for (EXPR; EXPR; EXPR) BLOCK
LABEL foreach VAR (ARRAY) BLOCK
LABEL BLOCK continue BLOCK

Note that, unlike C and Pascal, these are defined in terms
of BLOCKs, not statements. This means that the curly brack-
ets are required--no dangling statements allowed. If you
want to write conditionals without curly brackets there are
several other ways to do it. The following all do the same
thing:

if (!open(foo)) { die "Can't open $foo: $!"; }
die "Can't open $foo: $!" unless open(foo);
open(foo) || die "Can't open $foo: $!"; # foo or bust!
open(foo) ? 'hi mom' : die "Can't open $foo: $!";
# a bit exotic, that last one


The if statement is straightforward. Since BLOCKs are
always bounded by curly brackets, there is never any ambi-
guity about which if an else goes with. If you use unless
in place of if, the sense of the test is reversed.

The while statement executes the block as long as the
expression is true (does not evaluate to the null string or
0). The LABEL is optional, and if present, consists of an
identifier followed by a colon. The LABEL identifies the
loop for the loop control statements next, last, and redo
(see below). If there is a continue BLOCK, it is always
executed just before the conditional is about to be
evaluated again, similarly to the third part of a for loop
in C. Thus it can be used to increment a loop variable,
even when the loop has been continued via the next statement
(similar to the C "continue" statement).

If the word while is replaced by the word until, the sense
of the test is reversed, but the conditional is still tested
before the first iteration.

In either the if or the while statement, you may replace
"(EXPR)" with a BLOCK, and the conditional is true if the
value of the last command in that block is true.

The for loop works exactly like the corresponding while
loop:

for ($i = 1; $i < 10; $i++) {
...
}

is the same as

$i = 1;
while ($i < 10) {
...
} continue {
$i++;
}


The foreach loop iterates over a normal array value and sets
the variable VAR to be each element of the array in turn.
The variable is implicitly local to the loop, and regains
its former value upon exiting the loop. The "foreach" key-
word is actually identical to the "for" keyword, so you can
use "foreach" for readability or "for" for brevity. If VAR
is omitted, $_ is set to each value. If ARRAY is an actual
array (as opposed to an expression returning an array
value), you can modify each element of the array by modify-
ing VAR inside the loop. Examples:

for (@ary) { s/foo/bar/; }

foreach $elem (@elements) {
$elem *= 2;
}

for ((10,9,8,7,6,5,4,3,2,1,'BOOM')) {
print $_, "\n"; sleep(1);
}

for (1..15) { print "Merry Christmas\n"; }

foreach $item (split(/:[\\\n:]*/, $ENV{'TERMCAP'})) {
print "Item: $item\n";
}


The BLOCK by itself (labeled or not) is equivalent to a loop
that executes once. Thus you can use any of the loop con-
trol statements in it to leave or restart the block. The
continue block is optional. This construct is particularly
nice for doing case structures.

foo: {
if (/^^abc/) { $abc = 1; last foo; }
if (/^^def/) { $def = 1; last foo; }
if (/^^xyz/) { $xyz = 1; last foo; }
$nothing = 1;
}

There is no official switch statement in perl, because there
are already several ways to write the equivalent. In addi-
tion to the above, you could write

foo: {
$abc = 1, last foo if /^^abc/;
$def = 1, last foo if /^^def/;
$xyz = 1, last foo if /^^xyz/;
$nothing = 1;
}

or

foo: {
/^^abc/ && do { $abc = 1; last foo; };
/^^def/ && do { $def = 1; last foo; };
/^^xyz/ && do { $xyz = 1; last foo; };
$nothing = 1;
}

or

foo: {
/^^abc/ && ($abc = 1, last foo);
/^^def/ && ($def = 1, last foo);
/^^xyz/ && ($xyz = 1, last foo);
$nothing = 1;
}

or even

if (/^^abc/)
{ $abc = 1; }
elsif (/^^def/)
{ $def = 1; }
elsif (/^^xyz/)
{ $xyz = 1; }
else
{$nothing = 1;}

As it happens, these are all optimized internally to a
switch structure, so perl jumps directly to the desired
statement, and you needn't worry about perl executing a lot
of unnecessary statements when you have a string of 50
elsifs, as long as you are testing the same simple scalar
variable using ==, eq, or pattern matching as above. (If
you're curious as to whether the optimizer has done this for
a particular case statement, you can use the -D1024 switch
to list the syntax tree before execution.)

!short: Simple statements

Simple statements

The only kind of simple statement is an expression evaluated
for its side effects. Every expression (simple statement)
must be terminated with a semicolon. Note that this is like
C, but unlike Pascal (and awk).

Any simple statement may optionally be followed by a single
modifier, just before the terminating semicolon. The possi-
ble modifiers are:


if EXPR
unless EXPR
while EXPR
until EXPR

The if and unless modifiers have the expected semantics.
The while and until modifiers also have the expected seman-
tics (conditional evaluated first), except when applied to a
do-BLOCK or a do-SUBROUTINE command, in which case the block
executes once before the conditional is evaluated. This is
so that you can write loops like:

do {
$_ = ;
...
} until $_ eq ".\n";

(See the do operator below. Note also that the loop control
commands described later will NOT work in this construct,
since modifiers don't take loop labels. Sorry.)


!short: Expression

Expressions

The autoincrement operator has a little extra built-in magic
to it. If you increment a variable that is numeric, or that
has ever been used in a numeric context, you get a normal
increment. If, however, the variable has only been used in
string contexts since it was set, and has a value that is
not null and matches the pattern /^^[a-zA-Z]*[0-9]*$/, the
increment is done as a string, preserving each character
within its range, with carry:

print ++($foo = '99'); # prints '100'
print ++($foo = 'a0'); # prints 'a1'
print ++($foo = 'Az'); # prints 'Ba'
print ++($foo = 'zz'); # prints 'aaa'

The autodecrement is not magical.

The range operator (in an array context) makes use of the
magical autoincrement algorithm if the minimum and maximum
are strings. You can say

@alphabet = ('A' .. 'Z');

to get all the letters of the alphabet, or

$hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];

to get a hexadecimal digit, or

@z2 = ('01' .. '31'); print @z2[$mday];

to get dates with leading zeros. (If the final value speci-
fied is not in the sequence that the magical increment would
produce, the sequence goes until the next value would be
longer than the final value specified.)

The || and && operators differ from C's in that, rather than
returning 0 or 1, they return the last value evaluated.
Thus, a portable way to find out the home directory might
be:

$home = $ENV{'HOME'} || $ENV{'LOGDIR'} ||
(getpwuid($<))[7] || die "You're homeless!\n";


Along with the literals and variables mentioned earlier, the
operations in the following section can serve as terms in an
expression. Some of these operations take a LIST as an
argument. Such a list can consist of any combination of
scalar arguments or array values; the array values will be
included in the list as if each individual element were
interpolated at that point in the list, forming a longer
single-dimensional array value. Elements of the LIST should
be separated by commas. If an operation is listed both with
and without parentheses around its arguments, it means you
can either use it as a unary operator or as a function call.
To use it as a function call, the next token on the same
line must be a left parenthesis. (There may be intervening
white space.) Such a function then has highest precedence,
as you would expect from a function. If any token other
than a left parenthesis follows, then it is a unary opera-
tor, with a precedence depending only on whether it is a
LIST operator or not. LIST operators have lowest pre-
cedence. All other unary operators have a precedence
greater than relational operators but less than arithmetic
operators. See the section on Precedence.


!short: Precedence

Precedence

Perl operators have the following associativity and pre-
cedence:

nonassoc print printf exec system sort reverse
chmod chown kill unlink utime die return
left ,
right = += -= *= etc.
right ?:
nonassoc ..
left ||
left &&
left | ^^
left &
nonassoc == != <=> eq ne cmp
nonassoc < > <= >= lt gt le ge
nonassoc chdir exit eval reset sleep rand umask
nonassoc -r -w -x etc.
left << >>
left + - .
left * / % x
left =~ !~
right ! ~ and unary minus
right **
nonassoc ++ --
left '('

As mentioned earlier, if any list operator (print, etc.) or
any unary operator (chdir, etc.) is followed by a left
parenthesis as the next token on the same line, the operator
and arguments within parentheses are taken to be of highest
precedence, just like a normal function call. Examples:

chdir $foo || die; # (chdir $foo) || die
chdir($foo) || die; # (chdir $foo) || die
chdir ($foo) || die; # (chdir $foo) || die
chdir +($foo) || die; # (chdir $foo) || die

but, because * is higher precedence than ||:

chdir $foo * 20; # chdir ($foo * 20)
chdir($foo) * 20; # (chdir $foo) * 20
chdir ($foo) * 20; # (chdir $foo) * 20
chdir +($foo) * 20; # chdir ($foo * 20)

rand 10 * 20; # rand (10 * 20)
rand(10) * 20; # (rand 10) * 20
rand (10) * 20; # (rand 10) * 20
rand +(10) * 20; # rand (10 * 20)

In the absence of parentheses, the precedence of list opera-
tors such as print, sort or chmod is either very high or
very low depending on whether you look at the left side of
operator or the right side of it. For example, in

@ary = (1, 3, sort 4, 2);
print @ary; # prints 1324

the commas on the right of the sort are evaluated before the
sort, but the commas on the left are evaluated after. In
other words, list operators tend to gobble up all the argu-
ments that follow them, and then act like a simple term with
regard to the preceding expression. Note that you have to
be careful with parens:

# These evaluate exit before doing the print:
print($foo, exit); # Obviously not what you want.
print $foo, exit; # Nor is this.

# These do the print before evaluating exit:
(print $foo), exit; # This is what you want.
print($foo), exit; # Or this.
print ($foo), exit; # Or even this.

Also note that

print ($foo & 255) + 1, "\n";

probably doesn't do what you expect at first glance.

Subroutines

A subroutine may be declared as follows:

sub NAME BLOCK

Any arguments passed to the routine come in as array @_,
that is ($_[0], $_[1], ...). The array @_ is a local array,
but its values are references to the actual scalar parame-
ters. The return value of the subroutine is the value of
the last expression evaluated, and can be either an array
value or a scalar value. Alternately, a return statement
may be used to specify the returned value and exit the sub-
routine. To create local variables see the local operator.

A subroutine is called using the do operator or the & opera-
tor.

Example:

sub MAX {
local($max) = pop(@_);
foreach $foo (@_) {
$max = $foo if $max < $foo;
}
$max;
}

...
$bestday = &MAX($mon,$tue,$wed,$thu,$fri);

Example:

# get a line, combining continuation lines
# that start with whitespace
sub get_line {
$thisline = $lookahead;
line: while ($lookahead = ) {
if ($lookahead =~ /^^[ \t]/) {
$thisline .= $lookahead;
}
else {
last line;
}
}
$thisline;
}

$lookahead = ; # get first line
while ($_ = do get_line()) {
...
}

Use array assignment to a local list to name your formal arguments:

sub maybeset {
local($key, $value) = @_;
$foo{$key} = $value unless $foo{$key};
}

This also has the effect of turning call-by-reference into
call-by-value, since the assignment copies the values.

Subroutines may be called recursively. If a subroutine is
called using the & form, the argument list is optional. If
omitted, no @_ array is set up for the subroutine; the @_
array at the time of the call is visible to subroutine
instead.

do foo(1,2,3); # pass three arguments
&foo(1,2,3); # the same

do foo(); # pass a null list
&foo(); # the same
&foo; # pass no arguments--more efficient


!short: Passing by Reference

Passing By Reference

Sometimes you don't want to pass the value of an array to a
subroutine but rather the name of it, so that the subroutine
can modify the global copy of it rather than working with a
local copy. In perl you can refer to all the objects of a
particular name by prefixing the name with a star: *foo.
When evaluated, it produces a scalar value that represents
all the objects of that name, including any filehandle, for-
mat or subroutine. When assigned to within a local() opera-
tion, it causes the name mentioned to refer to whatever *
value was assigned to it. Example:

sub doubleary {
local(*someary) = @_;
foreach $elem (@someary) {
$elem *= 2;
}
}
do doubleary(*foo);
do doubleary(*bar);

Assignment to *name is currently recommended only inside a
local(). You can actually assign to *name anywhere, but the
previous referent of *name may be stranded forever. This
may or may not bother you.

Note that scalars are already passed by reference, so you
can modify scalar arguments without using this mechanism by
referring explicitly to the $_[nnn] in question. You can
modify all the elements of an array by passing all the ele-
ments as scalars, but you have to use the * mechanism to
push, pop or change the size of an array. The * mechanism
will probably be more efficient in any case.

Since a *name value contains unprintable binary data, if it
is used as an argument in a print, or as a %s argument in a
printf or sprintf, it then has the value '*name', just so it
prints out pretty.

Even if you don't want to modify an array, this mechanism is
useful for passing multiple arrays in a single LIST, since
normally the LIST mechanism will merge all the array values
so that you can't extract out the individual arrays.

!short: Regular Expressions

Regular Expressions

The patterns used in pattern matching are regular expres-
sions such as those supplied in the Version 8 regexp rou-
tines. (In fact, the routines are derived from Henry
Spencer's freely redistributable reimplementation of the V8
routines.) In addition, \w matches an alphanumeric charac-
ter (including "_") and \W a nonalphanumeric. Word boun-
daries may be matched by \b, and non-boundaries by \B. A
whitespace character is matched by \s, non-whitespace by \S.
A numeric character is matched by \d, non-numeric by \D.
You may use \w, \s and \d within character classes. Also,
\n, \r, \f, \t and \NNN have their normal interpretations.
Within character classes \b represents backspace rather than
a word boundary. Alternatives may be separated by |. The
bracketing construct ( ... ) may also be used, in which case
\ matches the digit'th substring. (Outside of the
pattern, always use $ instead of \ in front of the digit.
The scope of $ (and $`, $& and $') extends to the end
of the enclosing BLOCK or eval string, or to the next pat-
tern match with subexpressions. The \ notation some-
times works outside the current pattern, but should not be
relied upon.) You may have as many parentheses as you wish.
If you have more than 9 substrings, the variables $10, $11,
... refer to the corresponding substring. Within the pat-
tern, \10, \11, etc. refer back to substrings if there have
been at least that many left parens before the backrefer-
ence. Otherwise (for backward compatibilty) \10 is the same
as \010, a backspace, and \11 the same as \011, a tab. And
so on. (\1 through \9 are always backreferences.)

$+ returns whatever the last bracket match matched. $&
returns the entire matched string. ($0 used to return the
same thing, but not any more.) $` returns everything before
the matched string. $' returns everything after the matched
string. Examples:

s/^^([^^ ]*) *([^^ ]*)/$2 $1/; # swap first two words

if (/Time: (..):(..):(..)/) {
$hours = $1;
$minutes = $2;
$seconds = $3;
}

By default, the ^^ character is only guaranteed to match at
the beginning of the string, the $ character only at the end
(or before the newline at the end) and perl does certain
optimizations with the assumption that the string contains
only one line. The behavior of ^^ and $ on embedded newlines
will be inconsistent. You may, however, wish to treat a
string as a multi-line buffer, such that the ^^ will match
after any newline within the string, and $ will match before
any newline. At the cost of a little more overhead, you can
do this by setting the variable $* to 1. Setting it back to
0 makes perl revert to its old behavior.

To facilitate multi-line substitutions, the . character
never matches a newline (even when $* is 0). In particular,
the following leaves a newline on the $_ string:

$_ = ;
s/.*(some_string).*/$1/;

If the newline is unwanted, try one of

s/.*(some_string).*\n/$1/;
s/.*(some_string)[^^\000]*/$1/;
s/.*(some_string)(.|\n)*/$1/;
chop; s/.*(some_string).*/$1/;
/(some_string)/ && ($_ = $1);

Any item of a regular expression may be followed with digits
in curly brackets of the form {n,m}, where n gives the
minimum number of times to match the item and m gives the
maximum. The form {n} is equivalent to {n,n} and matches
exactly n times. The form {n,} matches n or more times.
(If a curly bracket occurs in any other context, it is
treated as a regular character.) The * modifier is
equivalent to {0,}, the + modifier to {1,} and the ? modif-
ier to {0,1}. There is no limit to the size of n or m, but
large numbers will chew up more memory.

You will note that all backslashed metacharacters in perl
are alphanumeric, such as \b, \w, \n. Unlike some other
regular expression languages, there are no backslashed sym-
bols that aren't alphanumeric. So anything that looks like
\\, \(, \), \<, \>, \{, or \} is always interpreted as a
literal character, not a metacharacter. This makes it sim-
ple to quote a string that you want to use for a pattern but
that you are afraid might contain metacharacters. Simply
quote all the non-alphanumeric characters:

$pattern =~ s/(\W)/\\$1/g;


!short: Formats

Formats

Output record formats for use with the write operator may
declared as follows:

format NAME =
FORMLIST
.

If name is omitted, format "STDOUT" is defined. FORMLIST
consists of a sequence of lines, each of which may be of one
of three types:

1. A comment.

2. A "picture" line giving the format for one output line.

3. An argument line supplying values to plug into a picture
line.

Picture lines are printed exactly as they look, except for
certain fields that substitute values into the line. Each
picture field starts with either @ or ^^. The @ field (not
to be confused with the array marker @) is the normal case;
^^ fields are used to do rudimentary multi-line text block
filling. The length of the field is supplied by padding out
the field with multiple <, >, or | characters to specify,
respectively, left justification, right justification, or
centering. As an alternate form of right justification, you
may also use # characters (with an optional .) to specify a
numeric field. (Use of ^^ instead of @ causes the field to
be blanked if undefined.) If any of the values supplied for
these fields contains a newline, only the text up to the
newline is printed. The special field @* can be used for
printing multi-line values. It should appear by itself on a
line.

The values are specified on the following line, in the same
order as the picture fields. The values should be separated
by commas.

Picture fields that begin with ^^ rather than @ are treated
specially. The value supplied must be a scalar variable
name which contains a text string. Perl puts as much text
as it can into the field, and then chops off the front of
the string so that the next time the variable is referenced,
more of the text can be printed. Normally you would use a
sequence of fields in a vertical stack to print out a block
of text. If you like, you can end the final field with ...,
which will appear in the output if the text was too long to
appear in its entirety. You can change which characters are
legal to break on by changing the variable $: to a list of
the desired characters.

Since use of ^^ fields can produce variable length records if
the text to be formatted is short, you can suppress blank
lines by putting the tilde (~) character anywhere in the
line. (Normally you should put it in the front if possible,
for visibility.) The tilde will be translated to a space
upon output. If you put a second tilde contiguous to the
first, the line will be repeated until all the fields on the
line are exhausted. (If you use a field of the @ variety,
the expression you supply had better not give the same value
every time forever!)

Examples:

# a report on the /etc/passwd file
format STDOUT_TOP =
Passwd File
Name Login Office Uid Gid Home
------------------------------------------------------------------
.
format STDOUT =
@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
$name, $login, $office,$uid,$gid, $home
.

# a report from a bug report form
format STDOUT_TOP =
Bug Reports
@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
$system, $%, $date
------------------------------------------------------------------
.
format STDOUT =
Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$subject
Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$index, $description
Priority: @<<<<<<<<<< Date: @<<<<<<< ^^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$priority, $date, $description
From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$from, $description
Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$programmer, $description
~ ^^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
~ ^^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
~ ^^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
~ ^^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
~ ^^<<<<<<<<<<<<<<<<<<<<<<<...
$description
.

It is possible to intermix prints with writes on the same
output channel, but you'll have to handle $- (lines left on
the page) yourself.

If you are printing lots of fields that are usually blank,
you should consider using the reset operator between
records. Not only is it more efficient, but it can prevent
the bug of adding another field and forgetting to zero it.

!short: Interprocess Communication

Interprocess Communication

The IPC facilities of perl are built on the Berkeley socket
mechanism. If you don't have sockets, you can ignore this
section. (Sockets are not yet supported on MS-DOS.)

!short: Predefined Names

Predefined Names

The following names have special meaning to perl. I could
have used alphabetic symbols for some of these, but I didn't
want to take the chance that someone would say reset
"a-zA-Z" and wipe them all out. You'll just have to suffer
along with these silly symbols. Most of them have
reasonable mnemonics, or analogues in one of the shells.


!short: Packages

Packages

Perl provides a mechanism for alternate namespaces to pro-
tect packages from stomping on each others variables. By
default, a perl script starts compiling into the package
known as "main". By use of the package declaration, you can
switch namespaces. The scope of the package declaration is
from the declaration itself to the end of the enclosing
block (the same scope as the local() operator). Typically
it would be the first declaration in a file to be included
by the "require" operator. You can switch into a package in
more than one place; it merely influences which symbol table
is used by the compiler for the rest of that block. You can
refer to variables and filehandles in other packages by pre-
fixing the identifier with the package name and a single
quote. If the package name is null, the "main" package as
assumed.

Only identifiers starting with letters are stored in the
packages symbol table. All other symbols are kept in pack-
age "main". In addition, the identifiers STDIN, STDOUT,
STDERR, ARGV, ARGVOUT, ENV, INC and SIG are forced to be in
package "main", even when used for other purposes than their
built-in one. Note also that, if you have a package called
"m", "s" or "y", the you can't use the qualified form of an
identifier since it will be interpreted instead as a pattern
match, a substitution or a translation.

Eval'ed strings are compiled in the package in which the
eval was compiled in. (Assignments to $SIG{}, however,
assume the signal handler specified is in the main package.
Qualify the signal handler name if you wish to have a signal
handler in a package.) For an example, examine perldb.pl in
the perl library. It initially switches to the DB package
so that the debugger doesn't interfere with variables in the
script you are trying to debug. At various points, however,
it temporarily switches back to the main package to evaluate
various expressions in the context of the main package.

The symbol table for a package happens to be stored in the
associative array of that name prepended with an underscore.
The value in each entry of the associative array is what you
are referring to when you use the *name notation. In fact,
the following have the same effect (in package main, any-
way), though the first is more efficient because it does the
symbol table lookups at compile time:

local(*foo) = *bar;
local($_main{'foo'}) = $_main{'bar'};

You can use this to print out all the variables in a pack-
age, for instance. Here is dumpvar.pl from the perl
library:

package dumpvar;

sub main'dumpvar {
($package) = @_;
local(*stab) = eval("*_$package");
while (($key,$val) = each(%stab)) {
{
local(*entry) = $val;
if (defined $entry) {
print "\$$key = '$entry'\n";
}
if (defined @entry) {
print "\@$key = (\n";
foreach $num ($[ .. $#entry) {
print " $num\t'",$entry[$num],"'\n";
}
print ")\n";
}
if ($key ne "_$package" && defined %entry) {
print "\%$key = (\n";
foreach $key (sort keys(%entry)) {
print " $key\t'",$entry{$key},"'\n";
}
print ")\n";
}
}
}
}

Note that, even though the subroutine is compiled in package
dumpvar, the name of the subroutine is qualified so that its
name is inserted into package "main".

!short: Style

Style

Each programmer will, of course, have his or her own prefer-
ences in regards to formatting, but there are some general
guidelines that will make your programs easier to read.

1. Just because you CAN do something a particular way
doesn't mean that you SHOULD do it that way. Perl is
designed to give you several ways to do anything, so
consider picking the most readable one. For instance

open(FOO,$foo) || die "Can't open $foo: $!";

is better than

die "Can't open $foo: $!" unless open(FOO,$foo);

because the second way hides the main point of the
statement in a modifier. On the other hand

print "Starting analysis\n" if $verbose;

is better than

$verbose && print "Starting analysis\n";

since the main point isn't whether the user typed -v or
not.

Similarly, just because an operator lets you assume
default arguments doesn't mean that you have to make use
of the defaults. The defaults are there for lazy sys-
tems programmers writing one-shot programs. If you want
your program to be readable, consider supplying the
argument.

Along the same lines, just because you can omit
parentheses in many places doesn't mean that you ought
to:

return print reverse sort num values array;
return print(reverse(sort num (values(%array))));

When in doubt, parenthesize. At the very least it will
let some poor schmuck bounce on the % key in vi.

Even if you aren't in doubt, consider the mental welfare
of the person who has to maintain the code after you,
and who will probably put parens in the wrong place.

2. Don't go through silly contortions to exit a loop at the
top or the bottom, when perl provides the "last" opera-
tor so you can exit in the middle. Just outdent it a
little to make it more visible:

line:
for (;;) {
statements;
last line if $foo;
next line if /^^#/;
statements;
}


3. Don't be afraid to use loop labels--they're there to
enhance readability as well as to allow multi-level loop
breaks. See last example.

4. For portability, when using features that may not be
implemented on every machine, test the construct in an
eval to see if it fails. If you know what version or
patchlevel a particular feature was implemented, you can
test $] to see if it will be there.

5. Choose mnemonic identifiers.

6. Be consistent.


!short: Setuid Scripts

Setuid Scripts

Setuid processes are not supported on MS-DOS. Perl is
designed to make it easy to write secure setuid and setgid
scripts. Unlike shells, which are based on multiple substi-
tution passes on each line of the script, perl uses a more
conventional evaluation scheme with fewer hidden "gotchas".
Additionally, since the language has more built-in func-
tionality, it has to rely less upon external (and possibly
untrustworthy) programs to accomplish its purposes.

In an unpatched 4.2 or 4.3bsd kernel, setuid scripts are
intrinsically insecure, but this kernel feature can be dis-
abled. If it is, perl can emulate the setuid and setgid
mechanism when it notices the otherwise useless setuid/gid
bits on perl scripts. If the kernel feature isn't disabled,
perl will complain loudly that your setuid script is
insecure. You'll need to either disable the kernel setuid
script feature, or put a C wrapper around the script.

When perl is executing a setuid script, it takes special
precautions to prevent you from falling into any obvious
traps. (In some ways, a perl script is more secure than the
corresponding C program.) Any command line argument,
environment variable, or input is marked as "tainted", and
may not be used, directly or indirectly, in any command that
invokes a subshell, or in any command that modifies files,
directories or processes. Any variable that is set within
an expression that has previously referenced a tainted value
also becomes tainted (even if it is logically impossible for
the tainted value to influence the variable). For example:

$foo = shift; # $foo is tainted
$bar = $foo,'bar'; # $bar is also tainted
$xxx = <>; # Tainted
$path = $ENV{'PATH'}; # Tainted, but see below
$abc = 'abc'; # Not tainted

system "echo $foo"; # Insecure
system "/bin/echo", $foo; # Secure (doesn't use sh)
system "echo $bar"; # Insecure
system "echo $abc"; # Insecure until PATH set

$ENV{'PATH'} = '/bin:/usr/bin';
$ENV{'IFS'} = '' if $ENV{'IFS'} ne '';

$path = $ENV{'PATH'}; # Not tainted
system "echo $abc"; # Is secure now!

open(FOO,"$foo"); # OK
open(FOO,">$foo"); # Not OK

open(FOO,"echo $foo|"); # Not OK, but...
open(FOO,"-|") || exec 'echo', $foo; # OK

$zzz = `echo $foo`; # Insecure, zzz tainted

unlink $abc,$foo; # Insecure
umask $foo; # Insecure

exec "echo $foo"; # Insecure
exec "echo", $foo; # Secure (doesn't use sh)
exec "sh", '-c', $foo; # Considered secure, alas

The taintedness is associated with each scalar value, so
some elements of an array can be tainted, and others not.

If you try to do something insecure, you will get a fatal
error saying something like "Insecure dependency" or
"Insecure PATH". Note that you can still write an insecure
system call or exec, but only by explicitly doing something
like the last example above. You can also bypass the taint-
ing mechanism by referencing subpatterns--perl presumes that
if you reference a substring using $1, $2, etc, you knew
what you were doing when you wrote the pattern:

$ARGV[0] =~ /^^-P(\w+)$/;
$printer = $1; # Not tainted

This is fairly secure since \w+ doesn't match shell meta-
characters. Use of .+ would have been insecure, but perl
doesn't check for that, so you must be careful with your
patterns. This is the ONLY mechanism for untainting user
supplied filenames if you want to do file operations on them
(unless you make $> equal to $<).

It's also possible to get into trouble with other operations
that don't care whether they use tainted values. Make judi-
cious use of the file tests in dealing with any user-
supplied filenames. When possible, do opens and such after
setting $> = $<. Perl doesn't prevent you from opening
tainted filenames for reading, so be careful what you print
out. The tainting mechanism is intended to prevent stupid
mistakes, not to remove the need for thought.


!short: MS-DOS Considerations

MS-DOS CONSIDERATIONS

This section describes the MS-DOS version of perl. MS-DOS
perl has been tested with MS-DOS versions 3.2, 3.3, 4.01,
and 5.0, and should work with 2.0 or higher. 640 K-bytes
are needed to get anything done. MS-DOS 5.0 running in high
memory is recommended.

Command-Line Arguments in MS-DOS

MS-DOS perl recognizes the MKS calling conventions. The MKS
Korn shell will expand the command-line wildcards and pass
perl arguments in a manner similar to Unix. Large numbers
of arguments can be passed this way, and -e commands can be
enclosed in single quotes just as in Unix. This works even
if $MKSARGS (see the section on the environment below) is
not set, but not if MKS support is compiled out.

When invoked from a non-MKS tool with MKS tooling present,
perl will use the MKS glob program (see $GLOB in the section
on the MS-DOS environment, below) to expand the wildcards.

When run on a system that doesn't have MKS tools, perl will
attempt to expand wildcards it sees in the argument list.
It is not possible to handle -e arguments properly due to
limitations in the standard MS-DOS calling conventions.

Unix and MKS do globbing differently than MS-DOS. Under
Unix and MKS, "abc*" matches abc.c, Under DOS, though, the
* won't match "." The MKS tools use the former interpreta-
tion; the perlglob.exe program distributed with perl and the
globbing built in to perl.exe use the latter. (Don't
install the distributed perlglob.exe if you are using the
MKS tools.)

File names in MS-DOS

Perl scripts should use forward slashes to delimit path name
components. (C programmers should note that most MS-DOS com-
pilers support this, too.) MS-DOS perl will work with
backslashes, but they make code unnecessarily messy since
they must be escaped any time they might be interpreted to
mean an escape sequence. Subprograms may or may not like
forward slashes: most programs compiled from C source like
them fine, but command.com won't take them. Slashes can be
reversed before invoking a subprocess using the perl substu-
tion features. Inside perl, the following are equivalent:

"c:/new/report"
"c:\\new\\report"

Drive prefixes are optional. If omitted, the file is sought
on the current drive.

Globbing <> Expressions on MS-DOS

Expressions like <*.c> are expanded by a subshell in Unix.
On MS-DOS there are two cases. If $MKSARGS is defined in
the environment and MKS support is compiled in, the $SHELL
is used, running echo piped to tr as with the Bourne shell
on Unix perl. Otherwise, the supplied perlglob.exe program
is run. This program simply takes the supplied command
line, DOS-globs it, and writes the result, null-separated,
to the perl process via a "pipe."

The MS-DOS Environment

MS-DOS perl uses a number of environment variables not used
by Unix perl. This is because the MS-DOS version is widely
distributed in executable form. (Not everyone who wishes to
use perl on MS-DOS has the tools to build it from the source
code.) Directory and file names in the environment can be
delimited with forward or back slashes. Environment vari-
ables are interpreted when they are needed. (Exception:
creation of $TMP from $TMPDIR is done at startup.) Thus the
perl script can change them. For example, if a huge "pipe"
file is to be created, and there's room on the hard disk
(c:) but not the usual $TMP, which is a RAM disk, one could
write:

ENV{'TMP'} = "c:/tmp";
open (HANDLE, "program|");

The state of the environment passed to perl.exe determines
argument parsing, though, since this is done before compila-
tion and interpretation of the perl script.

PATH This is the semicolon-separated path list. The
current directory is search first unless
$MKSARGS is defined and MKS support is compiled
in, in which case the current directory is
searched only in response to an explicit "." or
null path component.

MKSARGS Set to 1 enable all MKSisms. The MKS tools can
assume that the other pieces of the toolkit are
lying around, but perl.exe can't. (It may be
the only MKS-compatible program on a system.)
If have have the MKS tools, you should set this
variable. Perl will recognize and generate MKS
compatible arguments in any case, but without
the switch will default SLASHC to "-c" instead
of "-ce", will fail to run MKS glob, and will
run perl's glob instead of the Korn shell to
expand expressions like <*.c>. (Note that MKS
support can be compiled out.)

ROOTDIR Full drive and path where MKS toolkit is
installed. Example, "c:/mks" or "d:/". Typi-
cally is already set in MKS environment. Used
only if $MKSARGS is set and $GLOB is not. See
$GLOB. Ignored if MKS support compiled out.

TMP First choice for temporary files, e.g.,
"h:\\tmp". If not set, uses $TMPDIR (see
below), if that's not set, the current directory
is used. Swapping also goes here unless
$EXESWAP is defined. Temporary files are
pseudo-pipes, the swap file, and the -e file.

TMPDIR If $TMPDIR is set and $TMP is not, the following
is done internally:

( $ENV{'TMP'} = $ENV{'TMPDIR'} ) =~ s,/,\\,g;

(Backslashes are reversed as a gesture to decen-
dents of the perl process.) Creation of
$ENV{'TMP'} from $TMPDIR is done at perl.exe
startup. Note that the MKS tools use $TMPDIR as
a first choice; as a gesture of compatibility
for non-MKS users, here, it is a second choice.

EXESWAP First choice for swap out file location. A RAM
disk is a nice choice. $TMP is used if this
isn't set. (See also $TMPDIR). The swap file
created the first time swapping is invoked and
is left open until perl exits or does an exec.
Set to ".off" (note illegal DOS name) to inhibit
swapping--useful for speedy running of small
subprocesses.
This feature (inhibition) can be turned on and
off. The following example runs "ls" using
"e:/tmp" as the directory for the swap file. It
then runs "who," with swapping disabled.
Finally, it runs "ps" with swapping re-enabled.
Note that $ENV{'EXESWAP'} is set to 'yes' but
anything other than '.off' would have sufficed.

$ENV{'EXESWAP'} = 'e:/tmp';
system "ls";
$ENV{'EXESWAP'} = '.off';
system "who";
$ENV{'EXESWAP'} = 'yes';
system "ps";

GLOB First choice for MKS globbing program: full
path, name, and extension. Example:
"d:/mks/etc/glob.exe". The perl globbing pro-
gram (used for <*.c> expansion) is found, as
before, via the $PATH. Used only in an MKS
environment, and then only when perl is run from
a non-MKS program. Ignored if MKS support com-
piled out.

SHELL Full path name and extension of the shell used
used for subprocesses when wildcard expansion is
required, e.g., "c:/mks/bin/sh.exe". If unde-
fined, COMSPEC is used. Presumably this could
be the MKS korn shell, but it can be another
shell (e.g., 4DOS) and thus $SHELL is inspected
even if MKS support is compiled out.

COMSPEC Full path name of DOS command interpreter, e.g.,
"c:\\command.com" Used only if $SHELL is not
defined. If not found, "\\command.com" is used.
(It is bad practice to allow $COMSPEC to default
or to have it have anything other than a full
drive and path name. You don't want your pro-
grams looking for command.com on alternate
drives.)

METACHAR List of characters that are metacharacters to
the $SHELL or $COMPSPEC. Used to determine if
command can be run directly or if a subshell
must be invoked. If undefined, \|<> is used for
COMSPEC and *"?<>][`'\ for SHELL.

SLASHC The shell option for invoking a command.
Defaults: /c for $COMSPEC, MS-DOS version 4.x
or better [sic]; sc, where s is the switch char-
acter, for $COMSPEC, DOS < 4.0; -ce for $SHELL
if $MKSARGS is set and MKS support is compiled
in. (The -e needed to get the MKS Korn shell to
return the status properly). The default is -c
for other $SHELLs. (This is a guess.)

PERLLIB Directory(ies) containing perl library.
Defaults to /usr/local/lib/perl. Directories
are separated like $PATH: semicolons in MS-DOS,
colons in Unix. For the -P switch, only the
first $PERLLIB directory (or the default, if
there's no $PERLLIB) is tried.

Typical MKS setup (profile.ksh)

# If you're using the MKS stuff, you probably don't have
# to do anything other than set MKSARGS and PERLLIB.

export MKSARGS=1
# ROOTDIR set by init process or etc.rc or here.
export TMPDIR=e:/tmp
# EXESWAP left to default to $TMPDIR
# GLOB left to default to $ROOTDIR/etc/glob.exe
# SHELL set by init process or here.
# COMSPEC not used by perl.exe but probably defined for other uses.
# METACHAR not defined, left to default.
export PERLLIB='c:/lib/perl;d:/usr/me/myperlib'

Typical non-MKS setup (autoexec.bat)

Rem You probably don't need to do anything except set $TMP,
Rem which you may be doing anyway.

Rem MKSARGS not set
Rem ROOTDIR not used
set TMP=d:mp
Rem EXESWAP left to default to $TMP
Rem GLOB not used
Rem SHELL not set
Rem COMPEC set by config.sys SHELL command or by MS-DOS startup.
Rem METACHAR not defined, left to default.
set PERLLIB=c:/lib/perl

Running Subprocesses on MS-DOS

Perl will by default swap itself out almost entirely when it
runs a subprocess other than MKS $GLOB. (See the $EXESWAP
environment variable). The swap file is opened when the
first subprocess is run and is left open until perl exits or
does an IR exec . The command line to be run is scanned for
$METACHARacters as described above. If none are found, the
subcommand is invoked directly. If metacharacter(s) are
found, a $SHELL or $COMSPEC is invoked to run the command.

Use of a single | in a open() command does not constitute a
metacharcter: this is a directive to perl to open a pipe.
The following, too, has no SHELL metacharacters since the
subprocess is simply pwd:

chop($direct = `pwd`);

Beware of MS-DOS "internal" commands; i.e., those that are
built into command.com. Examples are DIR and COPY.
COMMAND.COM users can use these directly if the command has
$METACHARacters; if not, you must invoke an explicit
command.com. In the first example below, '>' is a metachar-
acter. In the second example, there are no metacharacters
and so the internal "ver" command must be run with an expli-
clit $COMSPEC.

system "dir >my.fil";
system "$ENV{'COMSPEC'} /c ver";

Users of $SHELLs other than COMMAND.COM must use the second
format for anything to be passed to command.com.

Be aware that no wild card expansion is going to be done by
command.com usless you're using one of the built-in commands
that does it (e.g., COPY). You can use <> expansion to get
around this. Even those who have fancy $SHELLs should take
note of this, since having perl run the $SHELL and then the
command uses less memory than if perl runs the $SHELL which
runs the command:

$files = <*.c>;
`the_com $files`;


!short: Environment

ENVIRONMENT

Perl uses PATH in executing subprocesses, and in finding the
script if -S is used. HOME or LOGDIR are used if chdir has
no argument.

Apart from these, Unix perl uses no environment variables,
except to make them available to the script being executed,
and to child processes.