Dec 132017
Text of Part 2 of PERL language from June 1990 UNIX Review.
File PERL-RK2.ZIP from The Programmer’s Corner in
Category UNIX Files
Text of Part 2 of PERL language from June 1990 UNIX Review.
File Name File Size Zip Size Zip Type
PERL-RK2.TXT 12240 4789 deflated

Download File PERL-RK2.ZIP Here

Contents of the PERL-RK2.TXT file

UNIX Review July 1990 volume 8 number 6 p79(5)


by Rob Kolstad (Daemons and Dragons column)

Last month, this column introduced PERL, the "Practical Extraction and Report
Language''. PERL combines all the best features of C, sed, awk, shell
programming, database access, and text manipulation into one giant,
kitchen-sink language.

This month, after a quick recap, we'll cover regular expressions, directory
access, and subroutines. Next month, we'll cover built-in functions, system
calls, dbm databases, and present even more examples.

Last Month

Last month, we covered PERL's three fundamental scalar data types: numeric,
boolean, and string. All references to scalar variables in PERL. are of the
form $name, where name is the variable's name. PERL resembles awk in that
variables are typed by their most recent assignment and require no previous
declaration. PERL uses print and printf for output.

PERL also has composite variables. Vectors are one-dimensional and are indexed
numerically from 0 (by default) with square brackets surrounding the index.
Single values (scalars or single-element vectors) are accessed with the $
(dollar-sign) prefix. Naming an array with an @ (at-sign) prefix references
the entire array.

PERL also supports associative arrays that use a list (possibly with only one
element) of scalar data types as their index. References to associative arrays
require curly braces instead of square brackets.

PERL has a number of predefined variables (such as the current process ID and
a vector of invocation arguments).

PERL uses all of C's operators except the type-casting and address operators
(& and *). Additionally, PERL has exponentiation (**, **=), the range operator
(..), string concatenation (., .=), string repetition (x, x=), and (as does
the UNIX shell) file tests.

PERL has flow-control constructs very similar to those of C and awk (with the
exception that PERL lacks a case statement). Unlike C, however, PERL's control
constructs always require a set of enclosing braces. PERL uses next and last
instead of C's continue and break.

PERL can open files for reading, writing, or appending. Additionally, PERL can
open pipes (either for reading from or writing to). PERL has simple input
statements (such as <>) amd simple output (print) in addition to the standard
C-style printf statement.

PERL often uses the variable $_ as a default variable. When no specification
is made for the result of input operations, for example, the input goes to $_.
Functions for which arguments are omitted often operate on $_ (for example,
chop, the function that removes the last character of a string, will operate
on $_.

Regular Expressions

PERL's regular expressions are similar to those of egrep (a slightly more
powerful utility with a few syntactic quirks compared to grep). PERL's regular
expressions support ., *, [], and C-style escapes. New escapes include \w,
which matches alphanumerics and _ (the underscore), \d, which matches numeric
characters, \s, which matches white space, and \b, which matches word
boundaries. Capitalized versions of these letters (\W, \D, \S, and \B) match
their respective negations. Like egrep (and unlike grep), (), {}, and + do not
have to be escaped for their special meanings.

Use =~ and s for substitution:

$foo = alphabet;
$foo =~ s/alpha/beta/;
print $foo . \n;

which produces:


Use =~ and y for translation;

$foo = genafyngvba;
$foo =~ y/a-z/n-za-m/; # rot13
print $foo\n;

which produces:


With these two commands, PERL subsumes most of sed's and tr's abilities.

PERL sets three very useful variables after a match. These include:

$` - the text of the string before any matched part;
$& - the text of the string that was matched;
$' - the text of the string after any matched part.

Here is an example of these variables' use:

$foo = Now is the time;
$foo =~ /is/;
print Before: =$`=\n;
print Match: =$&=\n;
print After: =$'=\n;

which produces:

Before: =Now =
Match: =is=
After: = the time=

Using the =~ operator, PERL searches strings for a regular-expression match
(or search failure with the !~ operator). Here's an example that searches for
the string ous in two other strings:

if ( Houston =~ /ous/)
{ print Houston: yes\n; }
if ( Houston !~ /ous/ )
{ print Houston: no\n; }
if ( Dallas =~ /ous/ )
{ print Dallas: yes\n; }
if ( Dallas !~ /ous/ )
{ print Dallas: no\n; }

which yields (not surprisingly):

Houston: yes
Dallas: no


PERL can access directories three different ways. The obvious way of finding
all the files with a .c extension is:

open (FILES,/bin/ls *.c |);
while ($file = ) {

Assuming files exist, a new filename appears in $file each time is
mentioned. The chop command removes the newline from the end of the file (the
ls program put the newline there, of course).

The previous method requires a shell (to do the filename expansion) and a
process to run the ls program. The readdir function provides a list of the
names found in an open file handle (presumably a directory). Here's another
way of accessing a directory, using readdir:

if (!opendir(DIR,/tmp))
{ die Couldn't open /tmp; }
foreach $i (readdir(DIR))
{ print $i\n; }

This is pretty much identical to:

% ls /tmp

The first two lines show a safe way to open a file or directory. They are not
so useful for opening a pipe, since pipe opens return process ID's that are
always non-zero. The foreach control mechanism evaluates readdir only once,
then loops through its body setting $i to each list element returned by

Probably the easiest and fastest way to read a directory is to use PERL's
filename-expansion notation. A string that is enclosed in angle brackets and
that contains filename-expansion metacharacters evaluates to a list of
filenames that match those characters. Here's the same program done this new

foreach %x ( <*> )
{ print $x\n; }

Many VMS programmers move their FORTRAN files to UNIX and chagrined to find
that the command:

% mv *.FOR *.f

doesn't quite do what they'd like. Usually, it bombs off with a message about
some FORTRAN file not being a directory. In the worst case, however, the
directory with these files contains precisely two files with the .FOR
extension. In this case, the shell expands the metacharacters so that the
command line becomes:


The shell then happily invokes the rename command (mv) and quickly blows away
TWO.FOR altogether and echoes a new prompt, eagerly awaiting the next

A PERL program to do this is fairly simple:

foreach $x (< $.FOR>)
$y = $x;
$y =~ s/FOR$/f/;
rename ($x, $y);

The rename function is just one of many system calls to which PERL has access.
We could have used:

system ( mv $x $y );


'mv $x $y';

at the cost of several process forks.


The ability to write subroutines and functions in PERL eases coding and can
encourage the writing of reusable code. PERL subroutines and functions use sub
in their declarations, and subroutines are invoked using do, like this:

sub test {
print Hello world\n;
do test (); # parens are required

Including a return line turns the subroutine into a function. While one can
invoke functions with the do construct, PERL encourages the use of the &
invocation scheme to recover the value of the function:

sub f { return 23; }
print &f() . \n;

The & and do constructs are interchangeable except that using the & makes the
invocation an expression that has a value that can be used later in the

Most people feel that parameters greatly enhance the usefulness of
subprograms. Parameters appear in subprograms as elements of the global array
@_. The program below exemplifies this kind of argument passing:

sub add { return $_[0]+$_[1]; }
print &add(4, 9) . \n;

which yields the obvious:


Note, by the way, that one accesses elements of the @_ array using $_[index].
The namespaces for scalars and vectors are separate, so the commonly used
scalar $_ is distinct (and its value preserved).

The fact that the subprogram argument-passing array @_ is global does,
however, lead to problems in recursion, or when calling one subprogram from
another. PERL's solution to this is local variables. Local variables are
declared by sending a list of them to the local function. Optionally, once can
assign values to the variables (just as in sequential-list assignment) in
their declaration:

sub factorial {
local ($n) = $_[0];
if ($n < 2) { return 1; }
return $n * &factorial($n-1);
print &factorial(10).\n;

which yields:


which is even the correct answer!

Listing files in a subtree is a fine example of the use of many of the PERL
features we've seen so far. Consider this example:

1 sub list {
2 local($start, $i) = $_[0];
3 chdir $start;
4 foreach $i (<*>) {
5 print $i;
6 if(-d $i) {
7 print /\n;
8 do list($start/$i);
9 chdir $start;
10 }
11 else { printf \n ;}
12 }
13 }
14 do list(/tmp);

The list subroutine is called with a single argument that is the directory to
be listed recursively. Line 2 declares that the directory to be listed and a
loop variable are both local (necessary if we are to recurse). We then change
to the directory of interest, expand the filenames and start the main loop
(line 4).

After printing the file or directory name (line 5), we check to see if it is a
directory. If it is a directory, we print the / and recursively call the list
subroutine with a new directory to list. Upon return, we must change directory
back to the directory of interest if future chdirs are to work correctly. If
the filename represents a regular file (line 11), then we print the newlinw
and continue.

Unfortunately, because of the way PERL implements its filename-expansion
mechanism, it is not possible (as of this writing) to call subroutines
recursively while using filename expansion. So, undaunted (and assuming the
boss is looking over our sgoulder), lets proceed to work around this problem.

Let's save the list of files (since that is the flaw) and try another

1 sub list {
2 local($start,$i,@fil) = $_[0];
3 chdir $start;
4 @fil = <*>;
5 foreach $i (@fil) {
6 print $i;
7 if(-d $i) {
8 print /\n;
9 do list($start/$i);
10 chdir $start;
11 }
12 else { printf \n ;}
13 }
14 }
15 do list(/tmp);

Line 2 declares the appropriate variables to be local, including the list of
filenames that we'll traverse and the starting directory. A quick recursion
test shows that the $_ arguments are correctly preserved throughout one level
of recursion, but not deeper ones. Local variables are the easiest recourse
for this problem. The rest of the routine is similar except for the locally
saved list of filenames.

PERL is a powerful tool for many programming tasks. You now have enough
information to program many day-to-day scripts in PERL.

 December 13, 2017  Add comments

Leave a Reply