Dec 212017
 
Aston Tate dBase IV Tech Notes for June 90. Useful information.
File TN9006.ZIP from The Programmer’s Corner in
Category Dbase Source Code
Aston Tate dBase IV Tech Notes for June 90. Useful information.
File Name File Size Zip Size Zip Type
AWK.TXT 30894 9979 deflated
CLONIN.TXT 3243 1502 deflated
SPLITT.TXT 15380 3896 deflated
UDFSAM.TXT 13100 4298 deflated

Download File TN9006.ZIP Here

Contents of the AWK.TXT file


4.1 AWK

*********************************************************
*** Special Note: See section 4.2 Errata & Addendum. ***
*********************************************************

I Am Not a Parrot
Import And Export With AWK
Roger Wegehoft

One of the impediments to using a powerful database management system
such as dBASE IV can be that of moving information to and from
external applications or data files. There exists a perplexing array
of file formats which can intimidate even the most steadfast and
practiced dBASE programmer. Since it isn't practical to have an
Import or Export option for every product with which we would like to
interchange data we are often left to our own devices.

While almost all applications external to dBASE IV support some form
of file import or export, the most common is likely to be plain ASCII
text. The use of plain text is not, however, without it's own
peculiar set of problems. In this article we will examine some of
those, addressing both specifics and generalities, with the help of a
particularly flexible tool called the AWK programming language. We
will look at some common problems and offer specific solutions which
may be generalized over a wide range of tasks with similar taxonomy.

What is AWK?

AWK is a programming language originally designed and implemented in
1977 by Aho, Kernighan, and Weinberger. It started as part of an
experiment to see if the characteristics of regular expression pattern
matching as exemplified in the UNIX(TM) utility grep could be combined
with many of the features of sed, a powerful word processor, to create
a generalized program for dealing with text and numbers.

Each AWK program is composed of one or more sets of patterns which are
associated with a corresponding action and have the form:

pattern { action }

The patterns are comprised of regular expression templates and are
compared against lines of text read from either the standard input or
a file. If a pattern matches, then the associated action is
performed. The action can be as simple as echoing the current line or
something more involved such as computation, translation or formatting
of the input data.

AWK Program Patterns

In an AWK program patterns control the execution of actions. As
before, when a pattern matches, then its associated action is
executed. There are six different types of patterns in AWK and a
thorough familiarity with each is required to understand the programs
which will be presented later in this article. Let's start off with
the "BEGIN" pattern which has the form:

BEGIN {statements }

The BEGIN pattern statements are executed once before any input has
been read. Next come the "expression" patterns. These patterns are
executed with each line that is input and where the expression
evaluates to true, nonzero or non-null. They have the following form:

expression { statements }

Another pattern, and probably the most widely used, is the "/regular
expression/." It has the form:

/regular expression/ { statements }

A regular expression is a notation for specifying and matching text.
Each regular expression is a basic expression which is created from
combinations of characters which have special significance. These
special characters, shown below, are called metacharacters, and are
described briefly below.

\ ^ $ . [ ] | ( ) * + ?

==================<<<>>>==================

Metacharacters in AWK
The regular expression metacharacters are:
\ ^ $ . [ ] | ( ) * + ?
A basic regular expression is one of the following:
a nonmetacharacter, such as A, that matches itself.
an escape sequence that matches a special symbol: \t matches a tab.
a quoted metacharacter, such as \*, that matches the metacharacter
literally.
^, which matches the beginning of a string.
$, which matches the end of a string.
, which matches any single character.
a character class: [ABC] matches any of the characters A, B, or C.
character classes may include abbreviations: [A-Za-z] matches any
single letter.
a complemented character class: [^0-9] matches any character except
a digit.

Operators combine regular expressions into larger ones:
alternation: A | B matches A or B.
concatenation: AB matches A immediately followed by B.
closure: A* matches zero or more A's
positive closure: A+ matches one or more A's
zero or one: A? matches the null string or A.
parentheses: (r) matches the same strings as r does.
================================================================
A compound pattern combines expressions with && (AND), || (OR), !
(NOT), and parentheses. The statements are executed at each input
line where the compound pattern is true. They have a form which looks
like this:

compound pattern { statements }

Finally we have range patterns. A range pattern will match each input
line matched by pattern1 to the next line matched by pattern2,
inclusive and may not be a part of any other pattern. The statements
are performed each time a line matches. Range patterns have the form:

pattern1, pattern2 { statements }

Last are the "END" statements which have the form:

END { statements }

In this pattern, the statements are executed once after all input has
been read. BEGIN and END patterns may not combine with other patterns
and are the only patterns which require an action.

Now that we have a basic understanding of the types of patterns which
make up an AWK program we can proceed to examine some real-world
problems and their corresponding AWK program solutions.

The Mailing List

One frequent type of problem deals with importing data which are
composed of fields that are vertically arranged in an ASCII source
file. An example of which might be a mailing list purchased from a
commercial list company or marketing firm. A typical example of which
is shown in below.
=================<<<
>>>================
1>Mr. John Nice
2>1234 Wee St.
3>Littletown, OH 32532
4>
5>
6>Mrs. Mary Smart
7>Apt #7
8>Sea Breeze Towers
9>Redondo Beach CA 90325
10>
11>Fred Rumble Sr.
12>Sidebar Ranch
13>Box 18
14>RR 37
15>Cloudburst, WY 72632
16>
=================================================
Figure 1 A sample mailing list in vertical ASCII format. NOTE: Line
numbers are not included in the actual file but are shown here to
amplify irregularities in the file format.


The problem here is two fold. First, we must determine what separates
one record from another and secondly we must decide upon a strategy
for moving the data into our dBASE IV database.

At first glance it might seem that there would be little which would
distinguish one address from another. They all have an addressee
which differs with each occurrence. The same can be said for the one
or more lines of street address (an additional problem we hadn't
mentioned) as well as the city, state, and zip codes. Actually they
all do have something in common and that is the pattern which
expresses the city, state, and zipcode.

In the real world it is not unusual to see addresses with
typographical errors. For example, we have intentionally omitted the
comma between the city and state of the second address to illustrate a
point. The point being that although there are inconsistencies
between various presentations of the CSZ (city, state, zip code)
information there are sufficient overlapping characteristics to
guarantee a high probability of successfully recognizing the CSZ
line. This high recognition rate is usually sufficient to uniquely
define the demarcation between records; even though they may be of
variable length with no special EOR (end of record) marker.

The AWK program in Listing 1, MailList.AWK, will translate information
in this form into a comma delimited, carriage return and linefeed
separated ASCII text file suitable for APPENDing into any version of
dBASE. Because this is one of the more complicated AWK programs in
this article, we will take some time to examine it in detail so that
we can more easily understand those which will follow.

The first pattern following the header comment section is the BEGIN
pattern. As we mentioned earlier, the BEGIN pattern statements are
executed once before any input has been read. Typically this is the
place in an AWK program where various initializations are performed.
In this particular case we have initialized several regular expression
patterns, a system variable, and some local variables.

The first regular expression pattern,

csz1 = "^[A-z]+[, ]+[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]"

is designed to match one possible variation of the CSZ line. In this
case the "^" character means that the match must occur at the start of
the line. The "[A-z]+[, ]" portion indicates that we are looking for
any number of contiguous upper or lower case alphabetic characters
which are immediately followed by a comma and any number of spaces.

The next portion of the pattern, "[A-Z][A-Z][ ]+", demands that the
previous segment be followed by exactly two capital letters and must
be followed by one or more spaces. Finally we have the five
consecutive "[0-9]" character classes which mean that in order to have
a successful match, the remainder of the text must end with exactly 5
digits in the range zero to nine. If all of the preceding criteria
are true then the pattern matches and the statements following the
pattern will be executed. The other two regular expressions are
similar in form and have the purpose of attempting to take into
account the majority of variations of the CSZ line which we might find
in our data.

A brief word about some of the variables used in the program is in
order. The system variable, OFS, refers to the Output Field
Separator. This is the character or series of characters which will
separate items in the output stream. In the case of a comma delimited
file it is, of course, a comma. In AWK, the OFS normally defaults to
a space (ASCII character 32). The program variable dquote has the
value of the double quote mark and maxlines is set to the maximum
number of lines per address. Oddly enough, in this program there is
only one pattern and it is the "NULL" pattern. In this case every
line will match and subsequently every line in the file will be read
and considered by the {actions} of the NULL pattern. If we were to
think of each line as a dBASE record then this is the AWK equivalent
of the dBASE language "DO WHILE .NOT. EOF()" looping construct.

An examination of the action portion of the NULL pattern will reveal a
basic IF..ELSE construct. The outer IF condition states that if the
current line ($0) matches csz1, csz2 or csz3, then we are matching a
CSZ line. Depending upon which pattern was matched, we should use the
appropriate parsing action to split out the city, state, and zipcode
information. Since the matching of a CSZ line indicates that we are
at the end of an address, the last action to perform is to output our
field information in comma delimited form. Otherwise, we can assume
that we are either somewhere between the end of the last record and
the beginning of the next or that we are processing valid address
lines.
In either case, the pattern "($0 !~ /^$/)" means to only take action
if the line is not blank. If the line isn't a blank line then we
store the line into the address array for later output when we have
detected the end of the record. Except for some minor house keeping
details, this is the essence of the program and should help you
understand the rest of the programs which follow.

Line and Record Terminators

Some data are received with non-standard EOL (End Of Line) and EOR
(End Of Record) indicators. These can range from infrequently used
ASCII punctuation to special control characters or graphic symbols.
For example, it is not uncommon for businesses to own both IBM
compatible and Apple computers. However, each has it's own common
form of storing text files. In the IBM/DOS environment, lines of text
will normally be terminated with a (carriage return/linefeed)
combination. Macintosh computers on the other hand, by convention,
omit the linefeeds, leaving only carriage returns as the EOL
separator. Normally a conversion between the two standards can be
accomplished by using AFE (Apple File Exchange) on the Mac or a
specialized program on the PC. Other programs can also be found to do
this but they are usually written in a compiled language and more
often than not the source or programmer is not available if and when
modifications are required. For these occasions it's handy to be able
to do it yourself and the short AWK programs in Listing 2, AddLF.AWK
and StripLF.AWK will let you do just that.

Importing Into Memo Fields

One of the more common (and complex) requests received at the
Ashton-Tate Software Support Center deals with importing compound data
structures into the dBASE IV environment. For example, data that is
organized as one or more lines of fixed field information followed by
varying lengths of freeform text, such as in the typical bulletin
board message dump or information service forum message listing; an
example of which is shown in Figure 2. While the previous examples
may also be accomplished with pure dBASE programming, this type of
scenario is very difficult for the novice programmer and extremely
inefficient if parsed only with the facilities of the dBASE language
itself. The AWK implementations ATBPars.AWK and ATMsgPar.AWK shown in
Listings 3 and 4 are not only efficient but succinct and provide a
foundation for variations on this theme with little effort.
================<<<
>>>===============
#77664 22-NOV-89 06:47 From: MMM To: /DB4 (1 reply)
Re: Archiving Messages

TO: Roger

I have what may seem like a strange problem but I've noticed that
there are a lot of good problems and solutions which appear on the
Ashton-Tate Bulletin Board. Many of these I would like to refer to
later, while I'm programming or designing an application.

Currently I just capture them to a text file but I would like to bring
them into dBASE for organizing and quick retrieval. I'd like to know
if there is an easy way to get these messages into a dBASE database?
Any ideas???

Thanks,

MMM

Figure 2 A typical example of a non-standard text format that can make
importing into dBASE something of a chore.
===============================================

Running The Programs

To run any of the programs mentioned in this article you should select
the corresponding batch file (see page 19) with the appropriate
command line syntax for your version of AWK. If you prefer, you may
execute them directly from the DOS prompt. You may need to modify the
batch file if your AWK command line syntax differs from either of
those provided. There are two different versions of AWK mentioned in
this article. The first is PDAWK, a freeware implementation which is
available on the Ashton-Tate BBS, CompuServe, or GEnie. The other is
PolyAWK, a commercial product. The following command line syntax
pertains to PDAWK:

AWK output.fil

And this for PolyAWK:

AWK -f output.fil

Special Notes

The batch file PDPARSE is for the distributable version (which is not
public domain but free for non-commercial use) of AWK which may be
downloaded from the DOS utilities library. PolyAWK, the commercial
implementation of the AWK language, is considerably faster.

Execution of the batch file will invoke the AWK interpreter, read in
the ATMsgPar.AWK script and produce the file ATMsgPar.O and numerous
files of the form Mnnnnn.TXT where "nnnnn" is the message number
associated with the text in the file. Be aware that the AWK program
will create a file for each message in the captured text file. This
means that you should set the program and the associated files up in a
subdirectory and not in the root of your hard disk since DOS imposes a
limitation of 128 files in the root. Additionally you will need free
disk space roughly equal to 2.5 times the size of the ASCII capture
file.

After the header and message files have been created the file
GETMsgTx.PRG shown in Listing 5 should be run which will append in the
header file ATMsgPar.O and then make a second pass to gather the text
body of each message into the corresponding memo field. The .DBF
structure referenced with GetMsgTx is shown in Figure 3 below.

=======================<<<
>>>======================

Field Field Name Type Width Dec Index

1 MSGNUM Character 6 N
2 SIG Character 20
N
3 DATE Date 8
N
4 TIME Character 8
N
5 SUBJECT Character 30 N
6 FROM Character 40
N
7 TO Character 40
N
8 MSG Memo 10
N

** Total ** 163

Figure 3 Strcutre for database: MSGSKEL.DBF
===========================================================
Now It's Your Turn

There are many other situations where text preprocessing is required.
If the text contains any kind of special character or repeating text
pattern then it is usually easy to construct an AWK program to
transform it into a file which dBASE IV can easily read. This
includes, as in the message parsing example, source files which
contain variable length "chunked" text associated with fixed field
header information.

In most cases a judicious application of the phrase "...the right tool
for the right job" applies, and the use of the AWK programming
language to facilitate Import and Export in the dBASE environment is
no exception. Although we have covered just enough of the AWK
language to help you understand and use the programs presented in this
article, with the aid of the AWK reference book mentioned, you should
be able to quickly familiarize yourself with AWK in sufficient detail
to tackle your own tenacious Import or Export problems.

Code References:

Listing 1: MailList.AWK
========================================================
# PURPOSE..: Parse address list into dBASE delimited format.
# AUTHOR...: Roger Wegehoft
# COPYRIGHT: Copyright (C) 1989 Ashton-Tate
BEGIN {
# Setup patterns.
csz1 = "^[A-z]+[, ]+[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]";
csz2 = "[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]";
csz3 = "[A-Z][A-Z][ ]+[0-9]+[-][0-9]+";
# Initialize variables.
OFS = ",";
dquote = "\042";
addrline = 1;
maxlines = 5; # Maximum lines per address
# (including City, State, Zip line).
}
# Read every line.
{
if (($0 ~ csz1) || ($0 ~ csz2) || ($0 ~ csz3)) {
if (match($0,csz1) > 0) { # With comma.
city = substr($0,1,index($0,",")-1);
sz =
substr($0,index($0,",")+2,length($0)-index($0,",")+2);
split(sz,cityzip," ");
state = cityzip[1];
zip = cityzip[2]; }
else {
if (match($0,csz3) > 0) { # Sets RSTART & RLENGTH
si = RSTART; }
else if (match($0,csz2) > 0) { # Sets RSTART & RLENGTH
si = RSTART; }
city = substr($0,1,RSTART-1);
sz = substr($0,si,length($0)-RLENGTH);
state = cityzip[1];
zip = cityzip[2];
}
outline = "";
for (i=1; i outline = outline dquote address[i] dquote ",";
}
outline = outline dquote city dquote ",";
outline = outline dquote state dquote ",";
outline = outline dquote zip dquote;
printf("%s\n",outline);
addrline = 1;
}
else if ($0 !~ /^$/) {
# pickup address lines
# print $0;
address[addrline]=$0;
addrline++;
}
}


Listing 2: AddLF.AWK and StripLF.AWK
================================================================
# PROGRAM : AddLF.AWK
# PURPOSE..: Add linefeeds to the end of text lines.
# AUTHOR...: Roger Wegehoft
# NOTE.....: Requires PolyAWK for RAWMODE operation.
BEGIN {
RAWMODE = 7;
RS = "\r";
ORS = "\r\n";
}
{
print $0;


# PROGRAM..: StripLF.AWK
# PURPOSE..: Remove linefeeds from the end of text lines.
# AUTHOR...: Roger Wegehoft
# COPYRIGHT: Copyright (C) 1989 Ashton-Tate.
{
printf("%s\r",$0);
}

}


Listing 3: ATBPars.AWK
================================================================

# PURPOSE..: Split Ashton-Tate BBS messages logs into separate files.
# AUTHOR...: Roger Wegehoft
# COPYRIGHT: Copyright (C) 1989 Ashton-Tate.
# 10/07/89
BEGIN {
OFS = ",";
dquote = "\042";
f0 = "ATMsgPar.TXT";
}
{
if (($0 ~ /^#[0-9]+/) && ($2 ~
/[0-9][0-9]-[A-Z][A-Z][A-Z]-[0-9][0-9]/))
{
if (length(f0)>0) { close(f1); };
msg = substr($1,2,length($1)-1);
day = substr($2,1,2);
year = substr($2,8,2);
if (substr($2,4,3)=="JAN") { month = "01" };
if (substr($2,4,3)=="FEB") { month = "02" };
if (substr($2,4,3)=="MAR") { month = "03" };
if (substr($2,4,3)=="APR") { month = "04" };
if (substr($2,4,3)=="MAY") { month = "05" };
if (substr($2,4,3)=="JUN") { month = "06" };
if (substr($2,4,3)=="JUL") { month = "07" };
if (substr($2,4,3)=="AUG") { month = "08" };
if (substr($2,4,3)=="SEP") { month = "09" };
if (substr($2,4,3)=="OCT") { month = "10" };
if (substr($2,4,3)=="NOV") { month = "11" };
if (substr($2,4,3)=="DEC") { month = "12" };
xdate = "19" year month day;
xtime = $3;
from = $5;
to = $7;
sig = $7;
f1 = "M" msg ".TXT";
# Uncomment one or the other of the following lines, depending upon
# which type of in-memo cross referencing you prefer; partial, full,
or
# comment out both if you prefer none at all.
printf("%s\n",msg) >>f1;
# printf("%s\n",$0) >>f1;
getline;
subject = substr($0,5,length($0)-4);
s = dquote msg dquote ","
dquote sig dquote ","
dquote xdate dquote ","
dquote xtime dquote ","
dquote subject dquote ","
dquote from dquote ","
dquote to dquote;
printf("%s\n",s) >>f0;
}
else
{
printf("%s\n",$0) >>f1;
}
}


Listing 5: ATMsgPar.AWK
=================================================================
# PURPOSE..: Split Ashton-Tate BBS messages logs into separate files.
# AUTHOR...: Roger Wegehoft
# COPYRIGHT: Copyright (C) 1989 Ashton-Tate.
# 10/07/89
BEGIN {
OFS = ",";
dquote = "\042";
f0 = "ATMsgPar.TXT";
}
{
if (($0 ~ /^#[0-9]+/) && ($2 ~
/[0-9][0-9]-[A-Z][A-Z][A-Z]-[0-9][0-9]/))
{
if (length(f0)>0) { close(f1); };
msg = substr($1,2,length($1)-1);
day = substr($2,1,2);
year = substr($2,8,2);
if (substr($2,4,3)=="JAN") { month = "01" };
if (substr($2,4,3)=="FEB") { month = "02" };
if (substr($2,4,3)=="MAR") { month = "03" };
if (substr($2,4,3)=="APR") { month = "04" };
if (substr($2,4,3)=="MAY") { month = "05" };
if (substr($2,4,3)=="JUN") { month = "06" };
if (substr($2,4,3)=="JUL") { month = "07" };
if (substr($2,4,3)=="AUG") { month = "08" };
if (substr($2,4,3)=="SEP") { month = "09" };
if (substr($2,4,3)=="OCT") { month = "10" };
if (substr($2,4,3)=="NOV") { month = "11" };
if (substr($2,4,3)=="DEC") { month = "12" };
xdate = "19" year month day;
xtime = $3;
from = $5;
to = $7;
sig = $7;
f1 = "M" msg ".TXT";
# Uncomment one or the other of the following lines, depending upon
# which type of in-memo cross referencing you prefer; partial, full,
or
# comment out both if you prefer none at all.
printf("%s\n",msg) >>f1;
# printf("%s\n",$0) >>f1;
getline;
subject = substr($0,5,length($0)-4);
s = dquote msg dquote ","
dquote sig dquote ","
dquote xdate dquote ","
dquote xtime dquote ","
dquote subject dquote ","
dquote from dquote ","
dquote to dquote;
printf("%s\n",s) >>f0;
}
else
{
printf("%s\n",$0) >>f1;
}
}

Listing 5: GetMsgTx.PRG
=============================================================
* PROGRAM...: GetMsgTx.PRG
* TECHNOTES.: 08/18/89
* AUTHOR....: Roger Wegehoft
* PURPOSE...: To read the output of ATMsgPar.AWK
* into a dBASE IV database.
* NOTES.....: This is just a SAMPLE.
* You'll probably have to change paths & etc.

SET TALK OFF
PUBLIC mtext
mtext = ""

SET PATH TO c:\dbase && Location of text files.
USE MSGSKEL && Message dbf template.
COPY STRUCTURE TO ATMSG && Copy to structure to working dbf.
USE ATMSG

APPEND FROM atmsgpar.o TYPE DELIMITED && Read the message header info.
USE && Close & save.
USE ATMSG
GO TOP
CLEAR

*--- This loop reads the "body" of the message into the memo field.
SCAN
mtext = "M" + TRIM(msgnum) + ".TXT"
@ 1,0
@ 1,0 say "Appending message no.: "+msgnum
APPEND MEMO msg FROM (mtext) OVERWRITE
ENDSCAN

SET TALK ON
RETURN



The AWK programing language was originally designed and implemented by
Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger in 1977 as
an experiment to see how the Unix tools grep, a program which finds
patterns composed of Regular Expressions and sed, a text editor, could
be generalized. AWK is a trademark of AT&T Bell Laboratories.
POLYTRON AWK is a product of Polytron Corporation. PDAWK is a
freeware implementation, available on the Ashton-Tate BBS, GEnie and
CompuServe.

REFERENCES: The AWK Programing Language, Alfred V. Aho, Brian W.
Kernighan, Peter J. Weinberger. Addison-Wesley, 1988.


4.2 Errata & Addendum

Errata & Addendum to the June'90 TechNotes article "I Am Not a
Parrot."

Because of deficiencies in the implementation of PDAWK which went
unnoticed at the time of the article's preparation some programs
presented in the article, which work perfectly well with
PolyAwk(tm), the commercial version of AWK mentioned in the
article, may not work properly under PDAWK. One example of this
is the Maillist program of figure 1. PDAWK appears to have
trouble substituting variables for regular expressions thus
requiring that the pattern variables be replaced directly by the
patterns themselves. Furthermore the builtin function index(s,t)
does not appear to be implemented. Fortunately, the match(s,r)
function may be substituted by replacing a regular expression for
the target string in the index function.

Another problem with PDAWK is that it cannot properly recognize
the regular expression pattern:

/^$/

which is used to locate a null or empty line. This can be
replaced with the expression "length($0) != 0" and will be
correctly interpretted in both versions of AWK.

Other implementation differences may exist that affect the proper
operation of some of the programs presented in the article. I will
attempt to find common solutions to both versions as these are
brought to my attention. In the mean time I would recommend that
if you are serious about using AWK that you consider investing in the
commercial version, PolyAwk(tm), due to it's more robust and
efficent implementation.

Roger Wegehoft
Support Services
Ashton-Tate



4.3 MAILLIST

"Mr. John Nice","1234 Wee St.","","","Littletown","OH","32532"
"Mrs. Mary Smart","Apt #7","Sea Breeze Towers","","Redondo Beach ","CA","90325-1234"
"Fred Rumble Sr.","Sidebar Ranch","Box 18","RR 37","Cloudburst","WY","72632"


4.4 MAIL2 (A modified version for PDAWK)

BEGIN {
# Program: Mail2.AWK
# Author: Roger Wegehoft
# Purpose: Parse address list into dBASE delimited format.
# Setup patterns.
# NOTE: This is a modified version of Maillist.AWK which works in both
# PolyAwk(tm) and PDAWK.
# Unused variables:
# csz1 = "^[A-z]+[, ]+[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]";
# csz2 = "[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]";
# csz3 = "[A-Z][A-Z][ ]+[0-9]+[-][0-9]+";
# Initialize variables.
OFS = ",";
dquote = "\042";
addrline = 1;
maxlines = 5; # Maximum lines per address (including City, State, Zip line).
}
# Read every line.
{
# if (($0 ~ csz1) || ($0 ~ csz2) || ($0 ~ csz3)) {
if (($0 ~ /^[A-z]+[, ]+[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]/) ||
($0 ~ /[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]/) ||
($0 ~ /[A-Z][A-Z][ ]+[0-9]+[-][0-9]+/)) {
if (match($0,/^[A-z]+[, ]+[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]/) > 0) { # With comma.
city = substr($0,1,match($0,/[,]/)-1);
sz = substr($0,match($0,/[,]/)+2,length($0)-match($0,/[,]/)+2);
split(sz,cityzip," ");
state = cityzip[1];
zip = cityzip[2]; }
else {
if (match($0,/[A-Z][A-Z][ ]+[0-9]+[-][0-9]+/) > 0) { # Sets RSTART & RLENGTH
si = RSTART; }
else if (match($0,/[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]/) > 0) { # Sets RSTART & RLENGTH
si = RSTART; }
city = substr($0,1,RSTART-1);
sz = substr($0,si,length($0)-RLENGTH);
split(sz,cityzip," ");
state = cityzip[1];
zip = cityzip[2];
}
outline = "";
for (i=1; ioutline = outline dquote address[i] dquote ",";
}
outline = outline dquote city dquote ",";
outline = outline dquote state dquote ",";
outline = outline dquote zip dquote;
printf("%s\n",outline);
addrline = 1;
}
# else if ($0 !~ /^$/) {
else if (length($0) != 0) {
# pickup only non-blank address lines
# print $0;
address[addrline]=$0;
addrline++;
}
}


4.5 MAIL2 (A batch file for PDAWK)

pdawk mail2.awk mail2.o


 December 21, 2017  Add comments

Leave a Reply