********************************************************* *** Special Note: See section 4.2 Errata & Addendum. *** *********************************************************
I Am Not a Parrot Import And Export With AWK Roger Wegehoft
One of the impediments to using a powerful database management system such as dBASE IV can be that of moving information to and from external applications or data files. There exists a perplexing array of file formats which can intimidate even the most steadfast and practiced dBASE programmer. Since it isn't practical to have an Import or Export option for every product with which we would like to interchange data we are often left to our own devices.
While almost all applications external to dBASE IV support some form of file import or export, the most common is likely to be plain ASCII text. The use of plain text is not, however, without it's own peculiar set of problems. In this article we will examine some of those, addressing both specifics and generalities, with the help of a particularly flexible tool called the AWK programming language. We will look at some common problems and offer specific solutions which may be generalized over a wide range of tasks with similar taxonomy.
What is AWK?
AWK is a programming language originally designed and implemented in 1977 by Aho, Kernighan, and Weinberger. It started as part of an experiment to see if the characteristics of regular expression pattern matching as exemplified in the UNIX(TM) utility grep could be combined with many of the features of sed, a powerful word processor, to create a generalized program for dealing with text and numbers.
Each AWK program is composed of one or more sets of patterns which are associated with a corresponding action and have the form:
pattern { action }
The patterns are comprised of regular expression templates and are compared against lines of text read from either the standard input or a file. If a pattern matches, then the associated action is performed. The action can be as simple as echoing the current line or something more involved such as computation, translation or formatting of the input data.
AWK Program Patterns
In an AWK program patterns control the execution of actions. As before, when a pattern matches, then its associated action is executed. There are six different types of patterns in AWK and a thorough familiarity with each is required to understand the programs which will be presented later in this article. Let's start off with the "BEGIN" pattern which has the form:
BEGIN {statements }
The BEGIN pattern statements are executed once before any input has been read. Next come the "expression" patterns. These patterns are executed with each line that is input and where the expression evaluates to true, nonzero or non-null. They have the following form:
expression { statements }
Another pattern, and probably the most widely used, is the "/regular expression/." It has the form:
/regular expression/ { statements }
A regular expression is a notation for specifying and matching text. Each regular expression is a basic expression which is created from combinations of characters which have special significance. These special characters, shown below, are called metacharacters, and are described briefly below.
\ ^ $ . [ ] | ( ) * + ?
==================<<<>>>==================
Metacharacters in AWK The regular expression metacharacters are: \ ^ $ . [ ] | ( ) * + ? A basic regular expression is one of the following: a nonmetacharacter, such as A, that matches itself. an escape sequence that matches a special symbol: \t matches a tab. a quoted metacharacter, such as \*, that matches the metacharacter literally. ^, which matches the beginning of a string. $, which matches the end of a string. , which matches any single character. a character class: [ABC] matches any of the characters A, B, or C. character classes may include abbreviations: [A-Za-z] matches any single letter. a complemented character class: [^0-9] matches any character except a digit.
Operators combine regular expressions into larger ones: alternation: A | B matches A or B. concatenation: AB matches A immediately followed by B. closure: A* matches zero or more A's positive closure: A+ matches one or more A's zero or one: A? matches the null string or A. parentheses: (r) matches the same strings as r does. ================================================================ A compound pattern combines expressions with && (AND), || (OR), ! (NOT), and parentheses. The statements are executed at each input line where the compound pattern is true. They have a form which looks like this:
compound pattern { statements }
Finally we have range patterns. A range pattern will match each input line matched by pattern1 to the next line matched by pattern2, inclusive and may not be a part of any other pattern. The statements are performed each time a line matches. Range patterns have the form:
pattern1, pattern2 { statements }
Last are the "END" statements which have the form:
END { statements }
In this pattern, the statements are executed once after all input has been read. BEGIN and END patterns may not combine with other patterns and are the only patterns which require an action.
Now that we have a basic understanding of the types of patterns which make up an AWK program we can proceed to examine some real-world problems and their corresponding AWK program solutions.
The Mailing List
One frequent type of problem deals with importing data which are composed of fields that are vertically arranged in an ASCII source file. An example of which might be a mailing list purchased from a commercial list company or marketing firm. A typical example of which is shown in below. =================<<<>>>================ 1>Mr. John Nice 2>1234 Wee St. 3>Littletown, OH 32532 4> 5> 6>Mrs. Mary Smart 7>Apt #7 8>Sea Breeze Towers 9>Redondo Beach CA 90325 10> 11>Fred Rumble Sr. 12>Sidebar Ranch 13>Box 18 14>RR 37 15>Cloudburst, WY 72632 16> ================================================= Figure 1 A sample mailing list in vertical ASCII format. NOTE: Line numbers are not included in the actual file but are shown here to amplify irregularities in the file format.
The problem here is two fold. First, we must determine what separates one record from another and secondly we must decide upon a strategy for moving the data into our dBASE IV database.
At first glance it might seem that there would be little which would distinguish one address from another. They all have an addressee which differs with each occurrence. The same can be said for the one or more lines of street address (an additional problem we hadn't mentioned) as well as the city, state, and zip codes. Actually they all do have something in common and that is the pattern which expresses the city, state, and zipcode.
In the real world it is not unusual to see addresses with typographical errors. For example, we have intentionally omitted the comma between the city and state of the second address to illustrate a point. The point being that although there are inconsistencies between various presentations of the CSZ (city, state, zip code) information there are sufficient overlapping characteristics to guarantee a high probability of successfully recognizing the CSZ line. This high recognition rate is usually sufficient to uniquely define the demarcation between records; even though they may be of variable length with no special EOR (end of record) marker.
The AWK program in Listing 1, MailList.AWK, will translate information in this form into a comma delimited, carriage return and linefeed separated ASCII text file suitable for APPENDing into any version of dBASE. Because this is one of the more complicated AWK programs in this article, we will take some time to examine it in detail so that we can more easily understand those which will follow.
The first pattern following the header comment section is the BEGIN pattern. As we mentioned earlier, the BEGIN pattern statements are executed once before any input has been read. Typically this is the place in an AWK program where various initializations are performed. In this particular case we have initialized several regular expression patterns, a system variable, and some local variables.
is designed to match one possible variation of the CSZ line. In this case the "^" character means that the match must occur at the start of the line. The "[A-z]+[, ]" portion indicates that we are looking for any number of contiguous upper or lower case alphabetic characters which are immediately followed by a comma and any number of spaces.
The next portion of the pattern, "[A-Z][A-Z][ ]+", demands that the previous segment be followed by exactly two capital letters and must be followed by one or more spaces. Finally we have the five consecutive "[0-9]" character classes which mean that in order to have a successful match, the remainder of the text must end with exactly 5 digits in the range zero to nine. If all of the preceding criteria are true then the pattern matches and the statements following the pattern will be executed. The other two regular expressions are similar in form and have the purpose of attempting to take into account the majority of variations of the CSZ line which we might find in our data.
A brief word about some of the variables used in the program is in order. The system variable, OFS, refers to the Output Field Separator. This is the character or series of characters which will separate items in the output stream. In the case of a comma delimited file it is, of course, a comma. In AWK, the OFS normally defaults to a space (ASCII character 32). The program variable dquote has the value of the double quote mark and maxlines is set to the maximum number of lines per address. Oddly enough, in this program there is only one pattern and it is the "NULL" pattern. In this case every line will match and subsequently every line in the file will be read and considered by the {actions} of the NULL pattern. If we were to think of each line as a dBASE record then this is the AWK equivalent of the dBASE language "DO WHILE .NOT. EOF()" looping construct.
An examination of the action portion of the NULL pattern will reveal a basic IF..ELSE construct. The outer IF condition states that if the current line ($0) matches csz1, csz2 or csz3, then we are matching a CSZ line. Depending upon which pattern was matched, we should use the appropriate parsing action to split out the city, state, and zipcode information. Since the matching of a CSZ line indicates that we are at the end of an address, the last action to perform is to output our field information in comma delimited form. Otherwise, we can assume that we are either somewhere between the end of the last record and the beginning of the next or that we are processing valid address lines. In either case, the pattern "($0 !~ /^$/)" means to only take action if the line is not blank. If the line isn't a blank line then we store the line into the address array for later output when we have detected the end of the record. Except for some minor house keeping details, this is the essence of the program and should help you understand the rest of the programs which follow.
Line and Record Terminators
Some data are received with non-standard EOL (End Of Line) and EOR (End Of Record) indicators. These can range from infrequently used ASCII punctuation to special control characters or graphic symbols. For example, it is not uncommon for businesses to own both IBM compatible and Apple computers. However, each has it's own common form of storing text files. In the IBM/DOS environment, lines of text will normally be terminated with a (carriage return/linefeed) combination. Macintosh computers on the other hand, by convention, omit the linefeeds, leaving only carriage returns as the EOL separator. Normally a conversion between the two standards can be accomplished by using AFE (Apple File Exchange) on the Mac or a specialized program on the PC. Other programs can also be found to do this but they are usually written in a compiled language and more often than not the source or programmer is not available if and when modifications are required. For these occasions it's handy to be able to do it yourself and the short AWK programs in Listing 2, AddLF.AWK and StripLF.AWK will let you do just that.
Importing Into Memo Fields
One of the more common (and complex) requests received at the Ashton-Tate Software Support Center deals with importing compound data structures into the dBASE IV environment. For example, data that is organized as one or more lines of fixed field information followed by varying lengths of freeform text, such as in the typical bulletin board message dump or information service forum message listing; an example of which is shown in Figure 2. While the previous examples may also be accomplished with pure dBASE programming, this type of scenario is very difficult for the novice programmer and extremely inefficient if parsed only with the facilities of the dBASE language itself. The AWK implementations ATBPars.AWK and ATMsgPar.AWK shown in Listings 3 and 4 are not only efficient but succinct and provide a foundation for variations on this theme with little effort. ================<<<>>>=============== #77664 22-NOV-89 06:47 From: MMM To: /DB4 (1 reply) Re: Archiving Messages
TO: Roger
I have what may seem like a strange problem but I've noticed that there are a lot of good problems and solutions which appear on the Ashton-Tate Bulletin Board. Many of these I would like to refer to later, while I'm programming or designing an application.
Currently I just capture them to a text file but I would like to bring them into dBASE for organizing and quick retrieval. I'd like to know if there is an easy way to get these messages into a dBASE database? Any ideas???
Thanks,
MMM
Figure 2 A typical example of a non-standard text format that can make importing into dBASE something of a chore. ===============================================
Running The Programs
To run any of the programs mentioned in this article you should select the corresponding batch file (see page 19) with the appropriate command line syntax for your version of AWK. If you prefer, you may execute them directly from the DOS prompt. You may need to modify the batch file if your AWK command line syntax differs from either of those provided. There are two different versions of AWK mentioned in this article. The first is PDAWK, a freeware implementation which is available on the Ashton-Tate BBS, CompuServe, or GEnie. The other is PolyAWK, a commercial product. The following command line syntax pertains to PDAWK:
AWK output.fil
And this for PolyAWK:
AWK -f output.fil
Special Notes
The batch file PDPARSE is for the distributable version (which is not public domain but free for non-commercial use) of AWK which may be downloaded from the DOS utilities library. PolyAWK, the commercial implementation of the AWK language, is considerably faster.
Execution of the batch file will invoke the AWK interpreter, read in the ATMsgPar.AWK script and produce the file ATMsgPar.O and numerous files of the form Mnnnnn.TXT where "nnnnn" is the message number associated with the text in the file. Be aware that the AWK program will create a file for each message in the captured text file. This means that you should set the program and the associated files up in a subdirectory and not in the root of your hard disk since DOS imposes a limitation of 128 files in the root. Additionally you will need free disk space roughly equal to 2.5 times the size of the ASCII capture file.
After the header and message files have been created the file GETMsgTx.PRG shown in Listing 5 should be run which will append in the header file ATMsgPar.O and then make a second pass to gather the text body of each message into the corresponding memo field. The .DBF structure referenced with GetMsgTx is shown in Figure 3 below.
1 MSGNUM Character 6 N 2 SIG Character 20 N 3 DATE Date 8 N 4 TIME Character 8 N 5 SUBJECT Character 30 N 6 FROM Character 40 N 7 TO Character 40 N 8 MSG Memo 10 N
** Total ** 163
Figure 3 Strcutre for database: MSGSKEL.DBF =========================================================== Now It's Your Turn
There are many other situations where text preprocessing is required. If the text contains any kind of special character or repeating text pattern then it is usually easy to construct an AWK program to transform it into a file which dBASE IV can easily read. This includes, as in the message parsing example, source files which contain variable length "chunked" text associated with fixed field header information.
In most cases a judicious application of the phrase "...the right tool for the right job" applies, and the use of the AWK programming language to facilitate Import and Export in the dBASE environment is no exception. Although we have covered just enough of the AWK language to help you understand and use the programs presented in this article, with the aid of the AWK reference book mentioned, you should be able to quickly familiarize yourself with AWK in sufficient detail to tackle your own tenacious Import or Export problems.
Code References:
Listing 1: MailList.AWK ======================================================== # PURPOSE..: Parse address list into dBASE delimited format. # AUTHOR...: Roger Wegehoft # COPYRIGHT: Copyright (C) 1989 Ashton-Tate BEGIN { # Setup patterns. csz1 = "^[A-z]+[, ]+[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]"; csz2 = "[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]"; csz3 = "[A-Z][A-Z][ ]+[0-9]+[-][0-9]+"; # Initialize variables. OFS = ","; dquote = "\042"; addrline = 1; maxlines = 5; # Maximum lines per address # (including City, State, Zip line). } # Read every line. { if (($0 ~ csz1) || ($0 ~ csz2) || ($0 ~ csz3)) { if (match($0,csz1) > 0) { # With comma. city = substr($0,1,index($0,",")-1); sz = substr($0,index($0,",")+2,length($0)-index($0,",")+2); split(sz,cityzip," "); state = cityzip[1]; zip = cityzip[2]; } else { if (match($0,csz3) > 0) { # Sets RSTART & RLENGTH si = RSTART; } else if (match($0,csz2) > 0) { # Sets RSTART & RLENGTH si = RSTART; } city = substr($0,1,RSTART-1); sz = substr($0,si,length($0)-RLENGTH); state = cityzip[1]; zip = cityzip[2]; } outline = ""; for (i=1; i outline = outline dquote address[i] dquote ","; } outline = outline dquote city dquote ","; outline = outline dquote state dquote ","; outline = outline dquote zip dquote; printf("%s\n",outline); addrline = 1; } else if ($0 !~ /^$/) { # pickup address lines # print $0; address[addrline]=$0; addrline++; } }
Listing 2: AddLF.AWK and StripLF.AWK ================================================================ # PROGRAM : AddLF.AWK # PURPOSE..: Add linefeeds to the end of text lines. # AUTHOR...: Roger Wegehoft # NOTE.....: Requires PolyAWK for RAWMODE operation. BEGIN { RAWMODE = 7; RS = "\r"; ORS = "\r\n"; } { print $0;
# PROGRAM..: StripLF.AWK # PURPOSE..: Remove linefeeds from the end of text lines. # AUTHOR...: Roger Wegehoft # COPYRIGHT: Copyright (C) 1989 Ashton-Tate. { printf("%s\r",$0); }
# PURPOSE..: Split Ashton-Tate BBS messages logs into separate files. # AUTHOR...: Roger Wegehoft # COPYRIGHT: Copyright (C) 1989 Ashton-Tate. # 10/07/89 BEGIN { OFS = ","; dquote = "\042"; f0 = "ATMsgPar.TXT"; } { if (($0 ~ /^#[0-9]+/) && ($2 ~ /[0-9][0-9]-[A-Z][A-Z][A-Z]-[0-9][0-9]/)) { if (length(f0)>0) { close(f1); }; msg = substr($1,2,length($1)-1); day = substr($2,1,2); year = substr($2,8,2); if (substr($2,4,3)=="JAN") { month = "01" }; if (substr($2,4,3)=="FEB") { month = "02" }; if (substr($2,4,3)=="MAR") { month = "03" }; if (substr($2,4,3)=="APR") { month = "04" }; if (substr($2,4,3)=="MAY") { month = "05" }; if (substr($2,4,3)=="JUN") { month = "06" }; if (substr($2,4,3)=="JUL") { month = "07" }; if (substr($2,4,3)=="AUG") { month = "08" }; if (substr($2,4,3)=="SEP") { month = "09" }; if (substr($2,4,3)=="OCT") { month = "10" }; if (substr($2,4,3)=="NOV") { month = "11" }; if (substr($2,4,3)=="DEC") { month = "12" }; xdate = "19" year month day; xtime = $3; from = $5; to = $7; sig = $7; f1 = "M" msg ".TXT"; # Uncomment one or the other of the following lines, depending upon # which type of in-memo cross referencing you prefer; partial, full, or # comment out both if you prefer none at all. printf("%s\n",msg) >>f1; # printf("%s\n",$0) >>f1; getline; subject = substr($0,5,length($0)-4); s = dquote msg dquote "," dquote sig dquote "," dquote xdate dquote "," dquote xtime dquote "," dquote subject dquote "," dquote from dquote "," dquote to dquote; printf("%s\n",s) >>f0; } else { printf("%s\n",$0) >>f1; } }
Listing 5: ATMsgPar.AWK ================================================================= # PURPOSE..: Split Ashton-Tate BBS messages logs into separate files. # AUTHOR...: Roger Wegehoft # COPYRIGHT: Copyright (C) 1989 Ashton-Tate. # 10/07/89 BEGIN { OFS = ","; dquote = "\042"; f0 = "ATMsgPar.TXT"; } { if (($0 ~ /^#[0-9]+/) && ($2 ~ /[0-9][0-9]-[A-Z][A-Z][A-Z]-[0-9][0-9]/)) { if (length(f0)>0) { close(f1); }; msg = substr($1,2,length($1)-1); day = substr($2,1,2); year = substr($2,8,2); if (substr($2,4,3)=="JAN") { month = "01" }; if (substr($2,4,3)=="FEB") { month = "02" }; if (substr($2,4,3)=="MAR") { month = "03" }; if (substr($2,4,3)=="APR") { month = "04" }; if (substr($2,4,3)=="MAY") { month = "05" }; if (substr($2,4,3)=="JUN") { month = "06" }; if (substr($2,4,3)=="JUL") { month = "07" }; if (substr($2,4,3)=="AUG") { month = "08" }; if (substr($2,4,3)=="SEP") { month = "09" }; if (substr($2,4,3)=="OCT") { month = "10" }; if (substr($2,4,3)=="NOV") { month = "11" }; if (substr($2,4,3)=="DEC") { month = "12" }; xdate = "19" year month day; xtime = $3; from = $5; to = $7; sig = $7; f1 = "M" msg ".TXT"; # Uncomment one or the other of the following lines, depending upon # which type of in-memo cross referencing you prefer; partial, full, or # comment out both if you prefer none at all. printf("%s\n",msg) >>f1; # printf("%s\n",$0) >>f1; getline; subject = substr($0,5,length($0)-4); s = dquote msg dquote "," dquote sig dquote "," dquote xdate dquote "," dquote xtime dquote "," dquote subject dquote "," dquote from dquote "," dquote to dquote; printf("%s\n",s) >>f0; } else { printf("%s\n",$0) >>f1; } }
Listing 5: GetMsgTx.PRG ============================================================= * PROGRAM...: GetMsgTx.PRG * TECHNOTES.: 08/18/89 * AUTHOR....: Roger Wegehoft * PURPOSE...: To read the output of ATMsgPar.AWK * into a dBASE IV database. * NOTES.....: This is just a SAMPLE. * You'll probably have to change paths & etc.
SET TALK OFF PUBLIC mtext mtext = ""
SET PATH TO c:\dbase && Location of text files. USE MSGSKEL && Message dbf template. COPY STRUCTURE TO ATMSG && Copy to structure to working dbf. USE ATMSG
APPEND FROM atmsgpar.o TYPE DELIMITED && Read the message header info. USE && Close & save. USE ATMSG GO TOP CLEAR
*--- This loop reads the "body" of the message into the memo field. SCAN mtext = "M" + TRIM(msgnum) + ".TXT" @ 1,0 @ 1,0 say "Appending message no.: "+msgnum APPEND MEMO msg FROM (mtext) OVERWRITE ENDSCAN
SET TALK ON RETURN
The AWK programing language was originally designed and implemented by Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger in 1977 as an experiment to see how the Unix tools grep, a program which finds patterns composed of Regular Expressions and sed, a text editor, could be generalized. AWK is a trademark of AT&T Bell Laboratories. POLYTRON AWK is a product of Polytron Corporation. PDAWK is a freeware implementation, available on the Ashton-Tate BBS, GEnie and CompuServe.
REFERENCES: The AWK Programing Language, Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger. Addison-Wesley, 1988.
4.2 Errata & Addendum
Errata & Addendum to the June'90 TechNotes article "I Am Not a Parrot."
Because of deficiencies in the implementation of PDAWK which went unnoticed at the time of the article's preparation some programs presented in the article, which work perfectly well with PolyAwk(tm), the commercial version of AWK mentioned in the article, may not work properly under PDAWK. One example of this is the Maillist program of figure 1. PDAWK appears to have trouble substituting variables for regular expressions thus requiring that the pattern variables be replaced directly by the patterns themselves. Furthermore the builtin function index(s,t) does not appear to be implemented. Fortunately, the match(s,r) function may be substituted by replacing a regular expression for the target string in the index function.
Another problem with PDAWK is that it cannot properly recognize the regular expression pattern:
/^$/
which is used to locate a null or empty line. This can be replaced with the expression "length($0) != 0" and will be correctly interpretted in both versions of AWK.
Other implementation differences may exist that affect the proper operation of some of the programs presented in the article. I will attempt to find common solutions to both versions as these are brought to my attention. In the mean time I would recommend that if you are serious about using AWK that you consider investing in the commercial version, PolyAwk(tm), due to it's more robust and efficent implementation.
Roger Wegehoft Support Services Ashton-Tate
4.3 MAILLIST
"Mr. John Nice","1234 Wee St.","","","Littletown","OH","32532" "Mrs. Mary Smart","Apt #7","Sea Breeze Towers","","Redondo Beach ","CA","90325-1234" "Fred Rumble Sr.","Sidebar Ranch","Box 18","RR 37","Cloudburst","WY","72632"
4.4 MAIL2 (A modified version for PDAWK)
BEGIN { # Program: Mail2.AWK # Author: Roger Wegehoft # Purpose: Parse address list into dBASE delimited format. # Setup patterns. # NOTE: This is a modified version of Maillist.AWK which works in both # PolyAwk(tm) and PDAWK. # Unused variables: # csz1 = "^[A-z]+[, ]+[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]"; # csz2 = "[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]"; # csz3 = "[A-Z][A-Z][ ]+[0-9]+[-][0-9]+"; # Initialize variables. OFS = ","; dquote = "\042"; addrline = 1; maxlines = 5; # Maximum lines per address (including City, State, Zip line). } # Read every line. { # if (($0 ~ csz1) || ($0 ~ csz2) || ($0 ~ csz3)) { if (($0 ~ /^[A-z]+[, ]+[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]/) || ($0 ~ /[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]/) || ($0 ~ /[A-Z][A-Z][ ]+[0-9]+[-][0-9]+/)) { if (match($0,/^[A-z]+[, ]+[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]/) > 0) { # With comma. city = substr($0,1,match($0,/[,]/)-1); sz = substr($0,match($0,/[,]/)+2,length($0)-match($0,/[,]/)+2); split(sz,cityzip," "); state = cityzip[1]; zip = cityzip[2]; } else { if (match($0,/[A-Z][A-Z][ ]+[0-9]+[-][0-9]+/) > 0) { # Sets RSTART & RLENGTH si = RSTART; } else if (match($0,/[A-Z][A-Z][ ]+[0-9][0-9][0-9][0-9][0-9]/) > 0) { # Sets RSTART & RLENGTH si = RSTART; } city = substr($0,1,RSTART-1); sz = substr($0,si,length($0)-RLENGTH); split(sz,cityzip," "); state = cityzip[1]; zip = cityzip[2]; } outline = ""; for (i=1; ioutline = outline dquote address[i] dquote ","; } outline = outline dquote city dquote ","; outline = outline dquote state dquote ","; outline = outline dquote zip dquote; printf("%s\n",outline); addrline = 1; } # else if ($0 !~ /^$/) { else if (length($0) != 0) { # pickup only non-blank address lines # print $0; address[addrline]=$0; addrline++; } }