Dec 262017
 
Text of Part 3 of PERL language from July 1990 UNIX Review.
File PERL-RK3.ZIP from The Programmer’s Corner in
Category UNIX Files
Text of Part 3 of PERL language from July 1990 UNIX Review.
File Name File Size Zip Size Zip Type
PERL-RK3.TXT 16220 6323 deflated

Download File PERL-RK3.ZIP Here

Contents of the PERL-RK3.TXT file



UNIX Review August 1990 volume 8 number 7 p44(4)

PERL: THE SUPER-LANGUAGE, PART III

by Rob Kolstad (Daemons and Dragons column)


Two columns ago we began a discussion of PERL, the "Practical Extraction and
Report Language" (see Vol. 8 Nos. 5 and 6). PERL combines all the best
features of C, sed, awk, shell programming, database access, and text
manipulation into one giant, kitchen-sink language. This month, after a quick
recap, we'll cover PERL's built-in functions, dbm databases, and several
examples of their uses.

Recap
~~~~~

So far we've explored PERL's three fundamental scalar data types: numeric,
boolean, and string. All references to scalar variables in PERL are of the
form $name, where name is the variable's name. PERL resembles awk in that
variables are typed by their most recent assignment and require no previous
declaration. PERL uses print and printf for output.

PERL has composite variables. Vectors are one-dimensional and are indexed
numerically from 0 (by default) with square brackets surrounding the index.
While single values (scalars or single-element vectors) are accessed using
the $ prefix, naming an array using the @ prefix references the entire array.

PERL also supports associative arrays that use a list (possibly having only
one element) of scalar data types as their indices. Associative array
references use curly braces instead of square brackets.

The language provides a number of predefined variables, such as the current
process id and ARGV, a vector of invocation arguments.

PERL uses all of C's operators except the type-casting and address operators
(& and *) @ Additionally, PERL has exponentiation (**, **=), the range
operator (..), string concatenation (., .=), string repetition (x, x=), and
- like the shell - file tests.

The flow-control constructs in PERL are very similar to those in C and awk
(with the exception that PERL lacks a case statement). Unlike C, however,
PERL's control constructs always require a set of enclosing braces. PERL
uses next and last instead of C's continue and break.

Through PERL you can open files for reading, writing, or appending, and
pipes, either for reading from or writing to a file. The input and output
statements are simple.

PERL often uses the variable $_ as a default variable. When no specification
is made for the receipt of input, for example, the input goes to $_. Also,
functions for which the argument is omitted often operate on $_ (chop, for
example, the function that removes the last character of a string).

The language supports regular expressions very similar to those of egrep and
has metacharacters for matching alphanumerics, numerics, white space, and
word boundaries. PERL supports =~ for substitution and translation. PERL
also has built-in variables that note the portion of a string before a match,
the portion that matches, and the portion after a match. The =~ and !~
operators work in conditional expressions to tell whether a match occurred or
not.

You can access directories through PERL in several ways, including the ls
command and a pipe, with the built-in opendir and readdir commands, or by
using angle brackets surrounding filename matching metacharacters.

PERL subprograms can be either functions (invoked with & and using the return
statement) or subroutines (invoked with do). Subprograms are declared like
this:

sub test {
print "Hello world\n";
}

Parameters magically appear in the @_ array (which is distinct from the
scalar $_ variable). Local variables (as distinguished from global variables)
must be noted in a list passed to PERL's built-in function local. Parameters
to a subprogram in the array $_[xxx] are call-by-reference (like FORTRAN) and
assignments to the @_ array can change the values in the calling segment.

Array and Vector Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~

PERL provides a powerful set of functions that operate on vectors and arrays.
The first set of these functions enables easy treatment of vectors as lists.
Four such vector-manipulation functions are:

pop: removes the last value from the end of an array
push: adds a value to the end of an array;
shift: removes the first value from the front of an array;
unshift: adds a value to the front of an array.

Here's an example that demonstrates how two of the functions can be combined
to make a rotate function (join is explained below):

@vec = (0, 2, 4, 6, 8);
print join(' ', @vec)." \n";
push (@vec, shift(@vec));
print join(' ', @vec)." \n";

which yields:

0 2 4 6 8
2 4 6 8 0

The awk-like function that breaks a line into constituent fields is called
split. It can have up to three arguments: a search pattern (specified with
slashes), an expression to be split, and an optional maximum limit of fields
to be assigned. The result of the split operation is assigned to a vector.
The first two arguments default to white space and $_, respectively. Here's a
quick program that prints user names and home directories from a standard
password file (which usually consists of seven colon-separated fields):

open (PASSWD, "/etc/passwd");
while () {
($login, $passwd, $uid, $gid, $gcos,
$home, $shell) = split(/:/);
print "$login $home\n";
}

Note the aggregate assignment for the results of the split function. An
alternate - in this case, slightly less readable - implementation might be:

open (PASSWD, "/etc/passwd");
while () {
@fields = split(/:/);
print "$fields[0] $fields[5]\n";
}

The inverse of split is join. It hooks fields together with a separator
character between them:

$line = join (':', $fields);

Sometimes when you use complicated expressions for split, join is not its
perfect inverse. If you use "white space" for your splitting criteria, for
example, it is not possible to know how many spaces separated the original
input fields.

Another useful array function is reverse. It inverts the order of a list:

foreach $tick (reverse 0..10)
{ print "$tick "; }
print "\n";

which yields:

10 9 8 7 6 5 4 3 2 1 0

PERL supplies a sort function that operates on and returns a list. In its
rawest form, sort sorts alphabetically. Note the order of this program's
output:

for ($i = 0; $i < 10; $i++)
{ $nums[$i] = int(rand(25)); }
foreach $i (@nums)
{ print "$i "; }
print "\n";
foreach $i (sort @nums)
{ print "$i "; }
print "\n" ;

which yields:

12 4 7 13 23 4 17 5 12 3
12 12 13 17 23 3 4 4 5 7

Of course the output is correct for lexicographic sorting. Few people sort
numbers lexicographically, though.

The sort function can take an auxiliary almost-function name as an additional
argument. The almost-function name refers an almost-function that has
implicit arguments a and b. The almost-function's goal is to evaluate to a
number that is negative, zero, or positive, which indicates (just like C's
qsort) whether a is, respectively, less than, equal to, or greater than b. As
an almost-function the arguments are implicit and the last expression is
implicitly the return value. Here's a simple example.

sub numerically { $a-$b; }
for (i = 0; i < 10; i++)
{ $nums[$i] = int(rand(25)); }
foreach $i (@nums)
{ print "$i "; }
print "\n";
foreach $i (sort numerically @nums)
{ print "$i "; }
print "\n";

which yields:

12 4 7 13 23 4 17 5 12 3
3 4 4 5 7 12 12 13 17 23

Note that the return is missing from this stripped-for-speed almost-function.

It is unfortunate that sort takes only a single list. If you have a set of
pairs - such as name and age - that you wish to sort together, you might wish
to combine them into a single variable (maybe using sprintf, join, or
concatenation), sort them (most certainly with your own special auxiliary
almost-function), then split them apart again.

PERL provides a grep function that selects elements of a list. Consider the
following segment that reads an entire file into memory, then eliminates
those lines containing a leading octothorp (phone-company jargon for #):

if (open(IN, "infile" )!= 1)
{ die "Can't open infile\n"; }
@list = ; # read entire file
close(IN);
@list = grep (!/^#/, @list);

It is important to note that grep returns a list and not a scalar.

Manipulating the keys and values of associative arrays is easy. The each
iterator successively returns a two-valued list of the key and value pairs of
an associative array (returning null after the last, after which the iterator
starts over):

while (($key, $value) = each %array) {
printf "%s is %s\n", $key, $value;
}

The pairs appear in what seems to be a random order. The each iterator (and
the functions discussed below) operate on the associative array's name,
specified with the % prefix.

Just as easy to use is the keys function that returns a list of the keys in
an associative array (also in an apparently random order). Here's the same
program written using keys:

foreach $key (keys %array) {
printf "%s is %s\n", $key, $array{$key};
}

PERL also has a values function that operates analogously.

PERL supplies many string functions with operations similar to those of C's
string functions. These include: crypt, index, rindex, length, substr, and
sprintf.

This program segment shows an example of each of these functions:

$a = "Random String";
print crypt($a, "uB")."\n";
# "uB" is the 'salt'
print index($a,"n")."\n";
print rlndex($a,"n")."\n";
prlnt length($a)."\n";
print substr($a,1,3)."\n";
print substr($a,1,99999)."\n";
print sprintf("--$a--\n",$a);

which yields:

uBH4o4BJIJZ6g
2 first 'n' is at 2
11 last 'n' is at 2
13 length is 13
and 3 characters starting at character 1
andom String rest of characters
--Random String--

The only one of these that is the least bit tricky is substr. The first
character is number 0, unless you reset $[. If you specify a negative offset
(second argument), then substr counts from the end of the string. Best of
all, you can use substr as an lvalue and assign to it on the left side of an
equal sign. Of course, if you assign a string shorter than the length
specified, the resultant final string will shrink; if you assign something
longer, it will grow. You can use sprintf or other combinations or functions
to keep the length constant.

System Calls
~~~~~~~~~~~~

To save time in making new processes, PERL supplies several standard
file-manipulation system calls as built-in functions. These include mkdir,
rmdir, chmod, chown, link, symlink (if supported), stat, rename, and unlink.
Here is some code that exemplifies the use of each of these system calls
(including symlink):

mkdir ("/tmp/foo", 0775)
rmdir "/tmp/foo";

$count = chmod 0755, "foo", "bar";
print "I changed $count modes\n";
$count = chown 28, 5, 'foo', 'bar';
# numerical uid gid
print "I changed $count uid/gid's\n";

link("/tmp/oldflle", "/tmp/newfile");
symlink ("/tmp/oldfile2", "/tmp/newfile1");

($dev,$ino,$mode,$nlink,$uid,
$gid,$rdev,$size,$atime,$mtime,
$ctime,$blksize,$blocks)
= stat($filename);
# stat can also use filehandles for its argument

rename("oldfilename", "newfilename");

$count = unlink "file1", "file2", "file3";
print "I removed $count files\n";

Of course, looking up those numerical group and ids for the chown call isn't
too hard:

$user = "kolstad";
open(PASS, '/etc/passwd')
| die "Can't open passwd: $!\n";
@passwd = ;
close(PASS);
@line = grep (/^${user}:/, @passwd);
($login,$pass,$uid,$gid) = split(/:/, $line[0]);
print "${user}'s uid is ${uid}, gid is ${gid}\n";

Dbm Files
~~~~~~~~~

PERL has facilities that ease manipulation of dbm-style files. The dbmopen
call binds a dbm or ndbm file to an associative array specified as dbmopen's
first argument. The associative array is not a file handle in actuality,
though it may resemble one. The second argument to dbmopen specifies the
database name - but without the extension (neither .dir nor .pag). You may
wish to check the existence of the file first, as this call will create a new
database with the file-protection mode specified by the third argument (and
modified by umask) if the database does not exist. Of course, this function
will fail if your system does not support dbm or ndbm. If you have write
access to the dbm file, you can set values in the associative array and they
will magically appear in the file.

Here's a simple example that will print the aliases for the mail system.

# print out history file offsets
dbmopen(ALIASES,"/etc/aliases",0666);
while (($key,$val) = each %ALIASES) {
print "$key=$val\n";
}
dbmclose(HIST);

and here is just a fraction of the output from my system:

kolstadt= kolstad
kohlsted= kolstad
robbie= kolstad
postmaster= polk
notes= "|/nbin/nfmail nfmaint"

Of course, the first thing I wanted to do was look up a single alias. After
opening the file in a test program, I executed this statement:

print $ALIASES{"colsted" }."\n";

and got a single newline as my output. I found out (don't ask me how one
would know this without being told) that one must append a null character to
the end of the lookup string:

print $ALIASES{"colsted"." \000"}."\n";

and this works perfectly well.

As you can see, PERL is the kitchen-sink language. l've been writing more
programs with it in the past few months than I expected.

I made an abortive attempt to thoroughly familiarize myself with PERL a year
ago, but found myself spending far too much time referring back to the manual
to look up this or that obscure function. While C has only one or two dozen
major constructs and then two more manual sections, my exposure to PERL was
only through a then-38-page manual that contained dozens and dozens of
functions and calls.

The advent of reference cards and short tutorials with examples has changed
all that. I began again in earnest to use PERL - as a substitute for awk -
for a series of programs that manipulate science-fair data. PERL can specify
just exactly when it would like to read a new input line - a feature very
difficult to implement in awk programs. It was only a short time before I was
hooked by the ease of writing and debugging PERL programs. The interpretive
nature means my errors came back to me very quickly, and I was able to
concentrate on single portions of the program without being interrupted by
compiles.

After about eight hours of actual coding, I can write most PERL programs
without referring to the manual. I still write little discardable
three-liners to learn precisely how a function works, but my confidence has
grown enormously.

I hope you enjoy the same success with PERL that I've had. I've found PERL to
be just the trick for massaging data, manipulating files to be what I want
them to be, and general small-task programming. None of my PERL programs is
longer than 100 lines yet. I have learned to use the include mechanism and
have a set of reusable subroutines that I save in a PERL directory.

Best of luck to you with PERL!




 December 26, 2017  Add comments

Leave a Reply