Category : Word Processors
Archive   : USEDI.ZIP
Filename : SFILES-B.1
File: SFILES-B.1
a part of U-SEDIT1.ZIP
Version 1 - January 15, 1990
Written by Mike Arst, c/o 1605 12th Ave., #10, Seattle, WA 98122
FidoNet address: send e-mail to 1:343/8.9 *by way of* 1:343/8.0.
All text is Copyright 1990 Mike Arst.
You may copy these files and transmit them in *unaltered* form to
computer bulletin boards. You may print out the text of the files
and/or photocopy the printouts for personal use. This text may
not be reproduced or published for any other purpose, by any
means now known or to be later developed, without the express
written permission of its author.
No one may charge a fee specifically for the distribution of this
file nor for the distribution of the others in the U-SEDIT1.ZIP
archive file (with "?" representing a version number), which
files include U-SED-IT.1, REGEXP.1, REFORMAT.INF, SFILES-A.1,
SFILES-B.1, and SED.EXE.
Copyright notices, and all language related to usage of this
text, must be retained in the files.
All proprietary names herein, such as Microsoft, DOS, MS-DOS,
Unix, and so on, are the property of their various owners, blah
blah blah.
If you upload the U-SEDIT archive to a bulletin board, please
upload it with all files that were in it when you got it.
SED SCRIPT FILES, PART II:
MANIPULATING THE HOLD AND WORK SPACES
Lucky you - this is going to be the skimpiest part of the
documentation. Reason: it's the stuff I know least about.
As mentioned a long time ago in a piece of documentation far, far
away, SED reads the input file line to be processed and copies
it into an area of memory known to SED freaks as the work space.
(It's also known as the pattern space - but you didn't want to
know that, did you? Neither did I. Too late. Now we both know it
- sorry.) There is another such memory area that doesn't
automatically contain any information, but into which you can
tell SED to move data for later manipulation: the hold space.
There are a number of commands dedicated to manipulating text in
the hold and work spaces; they are the most poorly explained of
File SFILES-B.1 - about SED - Copyright 1990 Mike Arst Page: 1
the SED commands I've read about so far. What follows is the
truly meager amount that I know about messin' with the work
space. (The hold space? Beats me, I'm sorry to say).
First, the bare-bones list - about all you get from a lot of SED
documentation. Remember as you read it that the commands, like
most SED commands, are case-sensitive. The numbers in parentheses
indicate how many "addresses" you can tell SED to look for in
executing any one of these instructions:
"d" (2) - Delete the contents of the current work space,
and go to the next line.
"D" (2) - In the current work space, delete the first line
in the work space up to the first "newline" character embedded
within it (more on that "newline" character shortly). Now process
the next line in the file (after printing - I assume - what's
left of the work space to standard output).
"g" (2) - Take whatever is in the hold space and use it to
replace, completely, the contents of the present work space.
"G" (2) - Take whatever is in the hold space and append its
contents in their entirety to the contents of the work space.
(Append means "add to to the *end* of." Appending does not replace
the text to which something is being appended.)
"h" (2) - Take whatever is in the current work space and
copy it into the hold space (entirely replacing what is now in
the hold space).
"H" (2) - Take whatever is in the current work space and
append its contents to those now in the hold space.
"n" (2) - Print the contents of the current work space to
standard output, then read the next line of the input file into
the work space. The SED docs I've been looking over say nothing
about whether the contents of the work space are zeroed out at
the same time that they're printed. So the question is: what,
*exactly,* is in the work space following the "n" command?
My favorite SED litany: "Beats me."
"N" (2) - Append a "newline" character to whatever is in
the work space, then append the contents of the next line of the
file to the work space. This is said to increment the current
line number upward by one.
"p" (2) - Print whatever is now in the work space (to
standard output).
"P" (2) - Print whatever is now in the work space -
everything up to the first "newline" character embedded within
File SFILES-B.1 - about SED - Copyright 1990 Mike Arst Page: 2
it - to standard output. The docs don't say whether this affects
the contents of the work space. Is the text up to the "newline"
removed? What about the text, if any, *following* the "newline"?
Ok, fine. I don't know what the devil most of that means any more
than you do. But, after literally days of attempting to make
sense of the [expletive deleted] SED documentation I've had to
work with, I was able to get one - count 'em - one of these
commands to work the way I wanted it to work in a script file.
Will miracles never cease? Probably not ...
Here was the goal: a SED script had already created many lines in
a row with the same character string at the beginning of each
line. It was an expedient way to handle most of the job, but then
it came time to remove all of the duplicated strings, leaving
only one of them at the beginning of each *paragraph* (any block of
contiguous lines set off from other such blocks by one or more
blank lines).
In other words, if the text to be altered looked like this - with
"XXX" representing the duplicated text I wanted to get rid of:
XXXLine one
XXXLine two
XXXLine three
XXXLine four
XXXLine five
XXXLine six
XXXLine seven
the desired result was:
XXXLine one
Line two
Line three
XXXLine four
Line five
XXXLine six
Line seven
This is the script file which did the trick:
/XXX/ {
:loop
N
s/^\(XXX.*\n\)XXX\(.*\)$/\1\2/
t loop
}
File SFILES-B.1 - about SED - Copyright 1990 Mike Arst Page: 3
First, the script file tells SED to work *only* on lines which
contain the characters "XXX." Now set up a loop so that the
processing will continue until there are no more strings reading
"XXX" to be processed.
"N" - append a "newline" to the work space, and then append the
*next* line of the input file to the work space.
Now there is a "newline" character in the current work space, and
the "newline" character can actually be *searched for* - not
something easily possible with other SED instructions you've
seen, since SED stops reading at the end of the line.
The "newline" character is referred to in script file
instructions this way:
\n
That I know of, "\n" can be used *only* within script file
instructions. Just what SED uses in memory for a "newline"
character, I don't know. It might be a line feed alone,
translated - when SED prints to standard output - to a carriage
return plus a line feed. For argument's sake let's say it's a
CR/LF pair and leave it at that.
The substitution command:
s/^\(XXX.*\n\)XXX\(.*\)$/\1\2/
which breaks down as follows:
s/ begin substitution
^ at the beginning of the input line ...
\( begin first of two numbered groups, which includes:
XXX.*\n
"XXX," followed by any single character occurring any
number of times, followed by the just-inserted "newline"
character.
\) end of the first group
XXX second occurrence of "XXX." (the occurrence which begins the
line right below the original input line. In other words: if line
1 is the first input line, then the second occurence of "XXX" is
the one that begins line 2.)
File SFILES-B.1 - about SED - Copyright 1990 Mike Arst Page: 4
\( begin the second group
.* any number of any character
\) end the second group
$ end of input altogether - actually, the end of line 2
(since it is now within the work space). Now here SED goes back
to reading up to, but not past, the end of the line. Although
line 2's contents have already been added into the work space,
those contents do *not* include the "newline" character which ends
line 2 itself - only the "newline" character *between* line 1 and
line 2.
/\1\2 replace it all with: the contents of the first group,
followed immediately by the contents of the second group.
Now notice: what is missing from the combination of "first group
followed immediately by second group?" The *second* occurrence of
"XXX," that's what. This substitution command, therefore,
eliminates the *second* occurrence of "XXX," changing nothing else
in the work space, including the "newline."
I found that if I eliminated the looping function from the SED
script, the instruction would fail. If I eliminated the { }
construction, it would also fail.
Breaking it down further - consider only lines 1 and 2 for a
moment:
When line 1 is read, this is what goes into the work space:
XXXLine one
Then the "N" command pushes a "newline" character into the space,
and after that the contents of the next line (line 2). Continuing
to use the "newline" notation shown earlier, now the work space
contains:
XXXLine one\nXXXLine two
When the substitution command creates groups, then the work space
looks like this (using angle-brackets and spaces to separate the
groups, just for emphasis):
[
Note that "XXX," the second time around, is *not* within a group.
When the substitution command is executed, the contents of the
work space are as follows:
XXXLine one\nLine two
File SFILES-B.1 - about SED - Copyright 1990 Mike Arst Page: 5
and when the work space is printed to standard output, because of
the "newline" character embedded in the work space, this is
what's printed:
XXXLine one
Line two
In other words, the "\n" - symbolizing a "newline" in the work
space, is translated back into an *actual* line-ending when SED
prints the work space.
In trying to understand why such instructions work, I come up
with the following: "Beginning of line followed by 'XXX' followed
by '\n' followed by anything followed by end of line' is true at
first on every line - before SED gets to it. After the *first*
successful processing, line 1 begins with "XXX" and line 2 no
longer does.
Two things happen, then: 1) the search criteria, no longer met,
cause SED to terminate the loop (see file SFILES-A.1 for more
information about loops); 2) the next line in the file, because
it no longer begins with "XXX," cannot be processed; you have
restricted the processing (with the { } structure) to lines which
begin that way. The next line is therefore skipped. If the one
after it *does* begin with "XXX," SED reads it into the work
space.
When SED hits the end of a paragraph - a line followed by one or
more blank lines - the last line doesn't begin with "XXX" any
longer, nor does the one after it (the blank line). But the one
after the blank line - the start of a new paragraph - does start
with "XXX"; processing begins again there.
In short, the *only* lines on which SED will permit "XXX" to
occur are those which begin a paragraph. Eureka.
KILLING MULTIPLE BLANK LINES
I often wondered how to use SED to kill multiple blank lines in a
file - that is, not simply how to kill all blank lines (that's
easy enough), but how to search for any unspecified number of
contiguous blank lines and replace them with only a *single* blank
line? The "N" command again comes to the rescue.
/^$/ {
:loop
N
s/^\n$//
t loop
}
File SFILES-B.1 - about SED - Copyright 1990 Mike Arst Page: 6
Again, it calls for a restriction: perform the work with regard
*only* to blank lines. And again there is a loop structure set up.
The { / } and ":loop / t loop" structures must both be present
for this script to work, although, curiously, you can swap the
positions of the { / } structure and the loop, and it'll still
work.
First SED reads a blank line into the work space. Then it pushes
a "newline" into the work space, finally adding the contents of
the next line. Thus endeth the "N" command.
The substitution command tells SED to find: beginning of line,
followed immediately by "\n", followed immediately by end of
line. The *only* situation in which the search criteria can be met?
Two blank lines in a row. Why? Because you've already told SED to
restrict processing to blank lines. Therefore, the *only* initial
content of the work space is a blank line.
The subsitution command tells SED to delete *everything* between
the beginning and end of input. But remember, the substitution
command cannot, itself, kill line boundaries. The effect is to
kill *only* the "\n" - leaving " ^ " and " $ " intact. If there
were two blanks in a row, only one remains; it is then printed to
standard output.
This substitution can occur only once *unless* the next line is
blank. If it isn't, SED will again try to run the commands in the
loop again - it worked once; the loop must be repeated. If the
next line is not blank, the substitution will fail. Thus - end of
loop. SED moves on.
But what if the next line *is* blank after all? Then the first is
again read into the work space; a "newline" is appended to the
work space; when the third item is appended, the search criteria
are again satisfied; the removal of "\n" occurs again. SED will
loop yet again - since the substitution did indeed occur ...
And so on and so forth. The next effect is: no matter how many
contiguous blank lines SED locates, at least *one* will remain
where before there might have been two, three, four (etc.) in a
row.
As mentioned in another part of the documentation, SED will
ignore "/^$/" as a blank line if there is even one space on it.
To be sure the above script works, as a safeguard you'll want to
ensure "clean" blank lines by way of:
s/^ \{1,\}$//
to remove all spaces from empty-*looking* lines.
File SFILES-B.1 - about SED - Copyright 1990 Mike Arst Page: 7
*** On occasion, "N" instructions like the ones discussed here
will cause bizarre things to happen in a file if they appear in
conjunction with other script file instructions. I would
therefore recommend your keeping such complex commands by
themselves in their own script files, and process your text as
follows:
sed -f script1 inputfile | sed -f script2> outputfile
where "script1" contains a set of instructions and "script2"
contains only the complex one with the "N" command and the
looping. I don't believe it will matter which script file is read
first, unless you have to do something with the string altered
via "N" *before* you run the more complex script command.
As always: if you have even the slightest doubt about the result
of SED instructions, run a test before over-writing an important
source file. It's *so* easy to trash files that way; you get a
little lazy, and suddenly your hard work is reduced to 0 bytes.
It could spoil your whole day.
So that is the sum total of what I know about these kinds of
commands - but what the hell, a little information is either
dangerous or else moderately useful. Or both.
If you can explain to me and to the rest of the world how to use
"g," "h," "H," "n," and so on, then treat yourself to an all-
expenses-paid trip somewhere. They probably don't have any
implementations of SED hanging out on the street corners there,
but then isn't that why you'd want to go on vacation in the first
place?
(E N D)
File SFILES-B.1 - about SED - Copyright 1990 Mike Arst Page: 8
Very nice! Thank you for this wonderful archive. I wonder why I found it only now. Long live the BBS file archives!
This is so awesome! 😀 I’d be cool if you could download an entire archive of this at once, though.
But one thing that puzzles me is the “mtswslnkmcjklsdlsbdmMICROSOFT” string. There is an article about it here. It is definitely worth a read: http://www.os2museum.com/wp/mtswslnk/