Category : Word Processors
Archive   : USEDI.ZIP
Filename : U-SED-IT.1
SOME KINDA SORTA DIFFERENT DOCUMENTATION FOR SED
Version 1 - January 19, 1990
By Mike Arst, Box 5, 1407 E. Madison St., Seattle, WA 98122
FidoNet address: send netmail *by way of* 1:343/8.0.
FILES INCLUDED IN THIS ARCHIVE:
U-SED-IT.1 .... What you're reading right now: preliminary remarks
about the SED program; how to run it; its command-
line switches; other SED nuts 'n' bolts.
REGEXP.1 ...... About regular expressions
REFORMAT.INF .. How to reformat these docs (using SED, of course)
for decent-looking output on a dot-matrix printer
SFILES-A.1 .... About SED script files
SFILES-B.1 .... More about scripts -- dealing with the "hold" and
"work" spaces
SED.EXE ....... The SED program itself
MORE-SED.INF .. Things I learned about SED after the main files
were written.
All text is Copyright 1990 Mike Arst.
You may copy these files and transmit them in *unaltered* form to
computer bulletin boards. You may print out the text of the files
and/or make copies of the printouts for personal use. This text
may not be reproduced or published for any other purpose, by any
means now known or to be later developed, without the express
written permission of its author.
No one may charge a fee specifically for the distribution of this
file nor for the distribution of the others listed above.
Copyright notices, and all language related to usage of this
text, must be retained in the files.
All proprietary names herein, such as Microsoft, DOS, MS-DOS,
UNIX, and so on, are the property of their various owners, blah
blah blah.
If you upload the U-SEDIT archive to a bulletin board, please
upload it with all files that were in it when you got it.
WHAT'S GOING ON HERE, ANYWAY?
The SED program was originally developed for the UNIX operating
system. "SED," standing for "stream editor," is one of the most
powerful text processing tools you can get your hands on.
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 1
Typical SED documentation is just plain awful - unless you're a
UNIX super-techie and thus have the ability to understand all
kinds of documentation, however badly written and whether or not
anyone else can understand it.
UNIX techies seem to delight in poor documentation. "Well," they
say, not bothering to suppress a grin of triumph, "after all,
it's *only* a reference manual." It turns out: virtually all UNIX
documentation is "only a reference manual" and nothing is
documentation!
This work - SED documentation for ordinary mortals - is my way of
getting back at the aliens in our midst (UNIX documentation
writers). It's also an attempt to contribute something to the
computer-using community. So many different authors of freeware
and shareware have helped me via their efforts. Maybe I can
return the favor, somehow.
My thanks to Tim Evans of Tacoma, WA, whose help with SED made it
possible for me to understand the program well enough to begin
using it in the first place. If you don't like what follows ...
well, blame Tim
DISCLAIMERS
The documentation is far from complete, and that's why I have
given it a date and a version number. I myself have a long way to
go before I'm a SED expert. There are SED commands I have never
been able to figure out from the SED documentation I myself have
been reading. Gotta begin somewhere, though. As I learn more, I
will update this documentation.
There are several versions of SED reverse-engineered for DOS from
UNIX - versions from GNU, MKS, Mix Software, and no doubt others.
They are not exactly alike - so keep in mind that not all of the
commands supported by the SED.EXE program included with this
documentation will work with every other version of SED you might
encounter - including UNIX's own SED.
As far as I know, the SED version I've included is a public-
domain implementation which can be freely copied and shared. If
anyone believes otherwise, they should contact me at once
(mailing address and FidoNet address are at the top of this
file).
ABOUT THE FORMATTING OF THIS DOCUMENTATION
It's done in a way some people won't like. There are descriptive
page footers and form-feed characters in the file, but the left
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 2
margin is not indented - inconvenient if you're going to print it
out and put it into a three-ring binder. It probably looks as if
the line width is abnormally short. There's a reason: makes for a
good first introduction to SED: you can use it to reformat the
files quickly and easily, giving them a four-character left
margin indent.
Information on how to do so is in the file REFORMAT.INF.
WHAT IS SED?
SED is a powerful non-interactive text editing tool developed, in
part (I reckon), to give UNIX documentation writers something to
louse up. It supports the use of regular expressions - methods of
symbolizing patterns of text such that the text doesn't always
have to be written out in full. The following is a simple
expression:
confused
It describes the lower case letters: "confused." You can also
call such an expression a "string literal" - a string of
characters to be taken quite literally. The following, on the
other hand, is a regular expression:
[Cc]onfuse[sd]
If you were to write it out in English, the regular expression
means: "Either a capital 'C' or a lower case 'c,' followed by the
string literal 'onfuse,' followed by either a lower case 's' or a
lower case 'd.'" In other words, it's a kind of shorthand that
describes any or all of the following:
Confused confuses confused Confuses
It's beyond me how someone came up with the term "regular
expression" for this kind of description, but there you have it.
The file REGEXP.1 goes into the construction and meaning of
regular expressions in detail. I would suggest that you read
REGEXP.1 at the same time you read this file; otherwise, some of
the commands shown in this text won't make much sense. Of course,
if you read REGEXP.1 first, then because you haven't read *this*
file yet, REGEXP.1 won't make a lot of sense. No winning that
one, eh?
HOW SED WORKS
SED is a "line-wise" editor: it reads the input file one line at
a time and puts the contents of the line - *not* including the line
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 3
boundary characters at the end of the line - into a memory buffer
called the "work space."
The program manipulates the text in the work space if a command
to do so has been given and if the command correctly describes
the text in the buffer. Finally, it prints the contents of the
work space to "standard output" (the screen), flushes the
contents of the work space (unless you have told SED to retain
them), and reads the next line of the input file.
(Here is what I will mean any time I say "print" in this
documentation: sending information to standard output. You can
certainly give a command for the output to be redirected to a
file, but that is not "printing" as I will mean it here.)
Line by line editing ... perhaps that makes it sound as if SED
were very sluggish. Believe me, it's not. It can process a pretty
good-sized file at a fair clip. The time to process a file will
depend on the number and complexity of your editing commands, but
rest assured: if you have a lot of search/replace work to do,
automating it with SED is going to be a *whole* lot faster than
running an interactive full-screen editor and doing a bunch of
semi-automated search-and-replace work. For a practical example,
see the file REFORMAT.INF.
WHAT IT READS; WHAT IT WON'T READ
SED reads ASCII files. It has trouble reading certain files
consisting entirely of "non-ASCII" characters - like binary files
(compiled programs).
I *have* been able to get it to read, and strip "highbits" from,
WordStar files. But when I tried to get it to process a pure
binary file it ground to a halt immediately. How it decides when
to run and when not to run, I don't know. I would imagine its
problem with binary files is that it can't figure out which parts
of the files are lines: where they begin; where they end. (A
"highbit" character, I should mention, is one having an ASCII
decimal value higher than 128. The box-drawing characters and
foreign-language symbols are examples.)
I strongly recommend against your trying to use SED to process
something like a Microsoft Word file saved in Word's proprietary
format. If SED alters the file, it's entirely possible Word will
crash, locking up your machine (time for the familiar old CTRL
ALT DEL treatment), if SED alters a Word file.
SED will not read past an end-of-file marker (CTRL Z) in an input
file, nor past a CTRL Z in one of its own script files. When it
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 4
is done processing a file, it will not add an end-of-file
character to the output.
Line lengths: As I recall, the "real" SED (under UNIX) has a
line-length limitation of something like 256 characters; it will
truncate lines any longer than that. The SED version included
with this documentation doesn't seem to have that small a limit,
but I have not determined for a certainty at what point it will
begin cutting off lines. If you have to process a file with very
long lines, do a test before trying to have SED over-write the
source file. I have successfully processed lines as long as 3500
characters without their being chopped - but no guarantees.
Legal and illegal line endings: SED is content with either line
feeds (CTRL J) alone, or the standard DOS line ending (CTRL M
followed by CTRL J). It *cannot* deal with files whose lines end
ONLY with CTRL M. It might be able to read them and display the
results of processing on-screen; it doesn't seem to be able
successfully to write such files back out to disk, and this
version will not add line feeds to carriage returns if the line
feeds weren't present in the input file.
If you have a file to which you need to add line feeds before
processing the file with SED, I would suggest using John Kruper's
fast stream editor called CHG.EXE, which has some excellent
capabilities. CHG.EXE doesn't support regular expressions as SED
does, but it is terrific at, among other things, dealing with
line endings (not a trivial task with SED). CHG.EXE is a stream
editor which is *not* limited to line-by-line processing. Look on a
computer bulletin board near you for an archive called
CHG214.ZIP. (But note: If you're a UNIX super-techie, you won't
like CHG.EXE because its documentation and command set aren't
cryptic enough.)
(There are other programs which can handle this kind of job, too
- PC MAGAZINE's little program CHANGE.COM, for example. However,
CHANGE.COM - at least the version I've tried - has a 32K input
file size limitation. Not too useful if you have to deal with
files any larger than that. For smaller files, you might like it
because it's quite fast. CHG.EXE is slightly slower (and has tons
more features).
Back to "highbit" characters for a moment: different versions of
SED deal with them in different ways. The version included with
this documentation can find highbit characters on a line but it
cannot *write* highbit characters. It will strip them down to their
lowbit equivalents every time. Irritating, but so it goes.
If you were to tell SED to replace some character with the one
having the ASCII decimal value of, say, 172 - the "1/4" fraction
symbol - SED will strip it right back down and you'll have not
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 5
the fraction but its lowbit equivalent (the character with the
decimal value of 44 - a comma). Rats. Oh, well ...
The GNU version of SED has this kind of problem if you run it one
way (call up a script file) and not another way (if you make the
subsitution via a command given at the DOS command line).
SED'S DEFAULTS
If you do not in some way tell SED to process only a portion of
an input file, it will begin by reading the first line, then
process *every* line down to the end (but again: not past an end-
of-file marker).
Unless you tell SED via its "-n" command-line switch to suppress
printing, or somehow restrict the number of lines to process, SED
will print every line in the file - whether it alters the lines
or not. In some cases this could result in double-printing of the
lines.
If you don't give any command to tell SED to write its output to
a file - or tell DOS to redirect SED's output to a file - it will
print *only* to standard output.
SED is quite literal-minded about the case of letters. If you
tell it to search for "hello" it will not find "Hello" or "HELLO"
or "HellO." There are ways around this; they'll be discussed in
greater detail in the file REGEXP.1 (See also the example,
earlier in this file, showing how a single regular expression can
represent several different words).
The program is also touchy about the case of the letters used in
its commands. *Always* type your commands in lower case unless this
documentation says otherwise. That is: virtually all command-line
switches and parts of editing commands (including the modifiers
and stand-alone command-line options discussed by and by) should
be in lower case.
If you have not told SED in a substitution command to replace
"globally," it will make a replacement only *once* on a line. More
about that shortly.
HOW INSTRUCTIONS ARE PROCESSED
As far as I can tell, SED processes instructions in the exact
order you've given them, whether you have given them on the
command line or via a script file. All editing instructions,
however given, are executed on each line of the input file before
the next line is processed, unless you in some way restrict the
work to specific lines rather than all lines.
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 6
Given the following:
sed -f test.sed -e s/yes/no/ inputfile
First the instructions in the script file TEST.SED are read and
executed; then SED executes the one following the "-e" switch.
The "-f" and "-e" switches will be discussed shortly.
WRITING THE COMMAND LINE
There are three basic ways to write a SED command:
1) Put the entire set of instructions on the command line;
2) Refer to a SED script file - a plain ASCII file containing
SED editing instructions;
3) Combine editing instructions on the command line with those
stored in a script file.
There are some instances in which you must enclose editing
commands within double quote marks. Details on that shortly, but
in brief: any time an editing command contains spaces or certain
characters DOS considers as separators or actual commands, you
must enclose the editing instruction within double quotes (single
quote marks will *not* work). Such special characters include:
< > | , ;
If you use the replacement for COMMAND.COM called 4DOS (from J.P.
Software), and if a SED instruction contains a caret-mark ( ^ ),
you must enclose the instruction within quote marks whether or
not it contains spaces or other special characters. Either that,
or you should use 4DOS' "escape" character (by default, CTRL X if
you haven't changed it to something else) to tell 4DOS *not* to
process the caret as a command separator.
If you use C. J. Dunford's CED command-line editor program, and
if you intend to use SED a lot by giving commands on the command
line (as opposed to a script file), you should consider changing
CED's default command separator from a caret to something else.
Reason: I'm told CED has some way of faking DOS out - it prevents
DOS from reading the quote marks in the command line first, as
normally happens. Thus, the command separator will not be
interpreted as literal text even when it's enclosed within quotes
(not a problem if you're using the caret in a SED script file,
however).
So what's the big deal about a silly caret, anyway? Believe me,
if you intend to use SED with any regularity, the caret mark
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 7
becomes very important; it means "at the beginning of a line."
More about that in the file REGEXP.1.
Important note about quote marks within SED commands - not
meaning those used to delimit instructions, but those to be
treated literally *as* quote marks in the text. If you have quotes
to be treated as string literals, you must set them off in a
special way:
\"
Right: Put a backslash character to the immediate left (not
required for apostrophe characters or so-called back-quotes,
however).
THE BASIC FORM OF A SED COMMAND:
sed [switches / editing commands] inputfile > outputfile
where:
[switches / editing commands]
could be any number of different commands; where:
inputfile
is the name of the file you want to process. The input file name
must *always* be the last item on the command line - the last item
before any redirection or "pipe" commands, that is - and where:
> outputfile
is an optional command to redirect SED's output to a file whose
name is represented here by "outputfile." The input and output
file names can be the same - you can over-write the source file,
that is - but it's not safe to do so via the command shown above.
Do this instead:
type inputfile | sed [editing instructions] > inputfile
Send the contents of the source file to standard output via the
TYPE command; pipe that information to SED (which then functions
as a "filter"); finally, redirect SED's output to the original
file name.
Trying to overwrite the file any other way is liable to result
either in a trashed file or a zero-length file. I have found to
my surprise that even the following doesn't work reliably:
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 8
sed [editing instructions] < inputfile > inputfile
Strange. It works fine with something like:
sort < inputfile > inputfile
but it doesn't work all the time with SED. Forewarned is
forearmed ...
No SED command will ever alter an input file unless you give an
instruction to DOS or to SED to over-write the original file.
If you redirect SED's output to a file instead of sending it to
standard output, you will not usually see the results of
processing on the screen at all. There is a way around this; it
will be dealt with later on.
Wildcards in the input file name: Yeah, you can try that.
Sometimes SED will play along and sometimes it won't. You'll just
have to experiment; I haven't figured out why it will support
wildcards on the command line at some times and not at others. I
have found that using *multiple* input file names on the command
line doesn't seem to work, either; this version of SED doesn't
have a command-line switch which means "the next text on the line
is another input file." Too bad, but so it goes.
COMMAND-LINE OPTIONS (SWITCHES)
SED command-line switches are two characters long; the first
character is always a hyphen. Unless otherwise noted, the switch
letters should *always* be lower case. There should not be a space
between the hyphen and the character which follows it. Unlike
some DOS programs, the version of SED included with this
documentation doesn't support the use of the forward slash ( / )
as an alternate switch character.
-n No automatic printing
As said, the program's default condition is to print every line
it reads - changed or otherwise. The "-n" switch will suppress
printing. If you use "-n," you will often need to use a "print
selected lines" command either on the command line or in a script
file. More on that later.
-e The next character string is an editing command
If you have more than one editing command on a line, you will
need to precede each of them with the "-e" switch.
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 9
Wrong: sed s/hello/goodbye/ s/yes/no/ inputfile
Right: sed -e s/hello/goodbye/ -e s/yes/no/ inputfile
But note the next two examples:
Right: sed -n s/hello/goodbye/p inputfile
Right: sed -n -e s/hello/goodbye/p inputfile
Why are they *both* right? The "-n" switch is not, itself, an
editing instruction. And then again:
Wrong: sed -n s/hello/goodbye/ -e s/yes/no/ inputfile
Right: sed -n -e s/hello/goodbye/ -e s/yes/no inputfile
In the latter examples there are two editing instructions on the
command line, and the "-e" must be used in front of both.
Put at least one space between the "-e" and the instruction.
Failure to use "-e" when it's needed will confuse SED; it might
think, for instance, that an editing command is the name of the
input file. This would be a dangerous sort of mistake if you were
trying to over-write the input file. SED would print an error
message and quit immediately, probably leaving only a zero-length
file on disk - kiss your input file goodbye.
-f The next item on the command line
is the name of a SED script file
There should be at least one space between the switch and the
name of the script file, which *can* contain path information.
SED doesn't assume a script file name has any particular
extension, so type its full name on the command line.
I have not been able to get SED to read script files when there
are wildcards in their names. That is:
sed -f script*.* inputfile
did not have the desired effect. You can try it, but you're on
your own. Perhaps GNU's version, or others' versions, work
differently.
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 10
-g Treat all substitution commands as
if the "g" modifier had been used
If you do not use the "g" (global) modifier in a substitution
command, SED will make a replacement on a given line only *once*.
If there are multiple occurrences of searched-for text, it will
replace only the first one. The "-g" switch tells the program to
behave as if you *had* given the "global" command in each and every
editing instruction. (More on "global," by and by.)
I suppose "-g" is useful if you have a script file whose editing
commands lack the "global" modifier. But then from time to time
you might want to insert the "global" modifier after all. The "-
g" switch is thus a way to get the desired effect without having
to alter your script files.
MAJOR COMMANDS
For the purposes of this documentation I'll separate SED's basic
editing commands into two main types, with several sub-
categories:
1) command-line instructions
a) substitution
b) deletion of lines
c) printing
d) writing to a file
e) miscellaneous weird commands
2) script file instructions
Scripts can include most of the command-line instructions, and a
lot more. See SFILES-A.1 file for more information.
LIMITING THE SEARCH
As said, the program will find *every* line in a file if you don't
tell it to do otherwise. The following are general methods for
limiting the search. These locations within a file, however
specified, are referred to in most SED documentation as
"addresses." When you limit the search to certain "addresses,"
SED will usually *not* execute an editing command on lines which do
not meet the search criteria.
BY LINE NUMBER
A single integer "n" will limit a SED command to line "n":
The command:
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 11
5 command
would execute "command" on line 5 *only*. Whether or not you tell
SED to print the other lines in the file, the command will be
limited to live 5.
$ command
would execute "command" only on the *last* line of the file. Used
this way, the dollar sign is reserved to mean: "last line of file."
1 command
limits the execution of "command" to the first line of the file.
BY RANGE OF LINES
1,22 command
would execute "command" on lines 1 through and including 22, and
not past that point. Note: no spaces between the first number,
comma, and second number.
All such ranges, whether specified by line number or by other
methods to be shown below, must count from top to bottom. This:
34,13 command
won't work.
5,$ command
would execute "command" beginning on line 5, through and
including the last line of the file.
BY SPECIFIED TEXT
/hello/ command
would limit the execution of "command" to any and all lines
containing the string "hello." Given SED's case-sensitivity, the
expression /hello/ would not tell SED to find lines containing
"Hello" or "HELLO" unless they *also* contained "hello" in all
lower case characters.
Note how the expression has been surrounded (delimited) by a
forward slash to either side. This is important: a numeral used
to indicate a line number, or two numbers separated by a comma to
indicate a range of line numbers, or the "$" (last line) address,
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 12
should *not* be delimited by slash characters. Otherwise SED will
assume that the numbers or the dollar sign represent literal text
to be searched for in finding lines. If the instruction:
4,12 command
means "execute 'command' on lines 4 through/including 12," then
/4/,12 command
means: "execute 'command' starting with the first line found to
contain numeral '4' down to, and including, line 12.
BY A RANGE OF TEXT
/hello/,/goodbye/ command
would execute "command" beginning on the first line where SED
finds the string "hello," down to and including the *last* line on
which it finds the string "goodbye" - and *not* past that point.
COMBINING LINE NUMBERS AND TEXT
4,/goodbye/ command
would execute "command" beginning on line 4 of the file, down to
and including the *last* line where SED finds the string "goodbye."
Again: how the first part of the address - the line number - is
not delimited by slash marks, but the expression "goodbye" *is*
surrounded by slashes.
/hello/,25 command
would execute "command" beginning on a line containing the string
"hello," down to and including line 25 of the input file.
/hello/,$ command
would execute "command" beginning on a line containing the string
"hello," down to and including the last line of the file.
There should never be a space between the first number,
character, or expression which begins a range, the comma which
separates one side of the range from the other, and the second
element (number, character, or expression) in the range.
However, in many cases the space can be added after the range.
Both of the following would be legal:
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 13
14s/hello/goodbye/g
14 s/hello/goodbye/g
Both say: "Do the substitution *only* on line 14."
Similarly, the following two instructions are legal:
5,$s/hello/goodbye/g
5,$ s/hello/goodbye/g
as are the following two:
/hello/,/goodbye/ s/yes/no/g
/hello/,/goodbye/s/yes/no/g
There is one exception to this business of being able to include
or not include spaces following the address range. More on that
in a bit.
Up to now the expressions shown in these addresses have been
simple string literals. They do not have to be limited to simple
expressions, though. Example: the use of a regular expression in
an address range:
/^[Hh]ello/,25 s/yes/no/g
to search for lines which begin either with "H" followed by
"ello," or by "h" followed by "ello" - and continue the
substitution from that point down to, and no farther than, line
25 in the file.
SPECIFYING EXCEPTIONS
What about a command you want to execute everywhere EXCEPT on a
certain line or range of lines? SED uses the character "!" to
mean "except for":
1,2!command
This says: execute "command" everywhere *except* on lines 1 and 2.
The only good way to illustrate this kind of instruction is to
show how it's used with, say, substitution commands. Since I
haven't talked much about those yet, I'm going to sneak
"specifying exceptions" into that part of the documentation -
read on - though I will make a few more points about it right
here.
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 14
SED uses a special kind of expression to indicate a blank line:
^$
This means: "beginning of line, followed immediately by end of
line" (i.e., a line with nothing in it but the line boundary
characters themselves).
The following expression ("don't operate on blank lines") can be
used in an "except for" instruction, though I have not yet
figured out how to tell SED to recognize it in other kinds of
instructions for a range of addresses:
/^$/!command
Will do the trick. Here is - how appropriate - an exception to
the rule about enclosing *only* strings of text withing the /
characters: if you use the above kind of instruction - meaning
"perform 'command' on all lines *except* blank lines" - you must
enclose the expression within the slash marks.
And, using the $ character in its different meaning as "last
line of file," you can specify "except on last line of file" this
way:
$!command
More on specifying "except for" addresses shortly.
Now, on to the major editing commands:
SUBSTITUTION
Meaning: search and replace. Here is the basic form of the
instruction:
s/search/replace/[optional modifiers]
Separating out the parts:
s/ Begins the instruction
search The string of text to search for (could be a regular
expression)
/ Separates the "search for" side of the editing
instruction from the "replace with" side; begins the
next part of the instruction.
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 15
replace The text to replace "search" with
/ Ends the "replace with" side of the instruction; if
there are no modifiers, ends the regular expression
[optional modifiers] (see below)
The example shown above, if there are no modifiers in it, means:
find the *first* occurrence of "search" on any line in the file; if
it's found, change it to "replace." If it occurs again on the same
line, ignore it and go on to process the next line of the file.
Given on the command line, the instruction would look like this:
sed s/search/replace/ inputfile
Given in a script file, however, the instruction would look like:
s/search/replace/
No other parts of the command line - switches or input file name
- would be included in the script file instruction line. And
(important!) anything which has to be bracketed (surrounded -
a.k.a. "delimited") by double quotes on the command line does *not*
have to be bracketed by double quotes in a script file.
Another example: search for a string that includes a space or
some special character:
sed "s/hello, there/goodbye/g" inputfile
Quote marks would have to be used so that DOS doesn't think
either the comma or the space after it should be passed to SED as
a separation between commands; the quotes keep the entire
instruction together as a *single* command.
What if you want to *delete* a character string?
sed s/please delete these words// inputfile
would remove "please delete these words" - right: the replacement
string is simply one / followed immediately by another /
character. Translation: replace the "found string" with *nothing.*
If the string you want to remove might occur more than once in a
line, don't forget the "global" modifier (discussed in detail,
shortly):
sed "s/please delete these words//g" inputfile
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 16
INDICATING A RANGE, using some of the methods shown above:
sed 1,5s/hello/goodbye/ inputfile
Find "hello" and replace it with "goodbye," *only* on lines 1
through/including 5. But note:
sed "1,5 s/hello/goodbye/" inputfile
Double quotes again. Why? You've separated the "1,5" from the
rest of the command with a space. To keep the whole thing
together, as it were, you must use the double quotes even though
the editing instruction itself doesn't contain anything to
require using them.
sed /hello/,/goodbye/s/yes/no/ inputfile
Starting on a line containing "hello" and continuing down to and
including, but no farther than, a line containing "goodbye," find
"yes" and replace it with "no." No quote marks needed. But note:
sed "/hello/,/goodbye/ s/yes/no/" inputfile
or:
sed "/hello there/,/goodbye/s/yes/no/" inputfile
There's a space within the expression that's used to indicate the
start of the range - so you must quote the *entire* instruction.
Likewise:
sed "/hello/,/bye now/s/yes/no/" inputfile
sed "/hello/,/goodbye/ s/yes/no/" inputfile
A few examples of selective address specification:
Say the input file looks like this:
Here is line one.
This is the second line.
Line three looks like this.
Here's the last line of the file.
The instruction:
sed 1,2s/e/X/g inputfile
(replace a lower case "e" with a capital "X" - globally) would
result in:
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 17
HXrX is linX onX.
This is thX sXcond linX.
Line three looks like this.
Here's the last line of the file.
The substitution is restricted to lines one and two. All lines
are printed, but only the two specified addresses are changed.
("Globally" doesn't necessarily mean "everywhere in the file."
Read on ...) The command:
sed /second/,/like/s/e/X/g inputfile
would result in:
Here is line one.
This is thX sXcond linX.
LinX thrXX looks likX this.
Here's the last line of the file.
You told SED to begin making the subsitution on the first line
found to contain "second" and continue making the subsitution on
all lines through/including - but not past - one which contains
the string "like."
sed $s/e/X/g inputfile
would result in:
Here is line one.
This is the second line.
Line three looks like this.
HXrX's thX last linX of thX filX.
You told SED to restrict the substitution to the file's last line
*only*. The instruction:
sed 3!s/e/X/g inputfile
woud result in:
HXrX is linX onX.
This is thX sXcond linX.
Line three looks like this.
HXrX's thX last linX of thX filX.
You told SED (using the "except for" modifier - the !
character) to replace "e" with "X" on every line *except* line
three of the input file. The instruction:
sed "/^This/ s/e/X/g" inputfile
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 18
would result in:
Here is line one.
This is thX sXcond linX.
Line three looks like this.
Here's the last line of the file.
You told SED to restrict the substitution to lines *beginning* with
the character string "This" (note the use of the ^ character).
Only one "address" in the file meets the criterion.
Try it with any number of ways of specifying addresses, as
discussed above. It's downright mind-boggling how many ways there
are to tell SED to find, or *not* to find, lines in a file.
SPECIFYING EXCEPTIONS - PART II
If this is the the input file:
Hello, how are ya?
What business is it of yours how I am?
Well, in that case I don't care how you are.
Fine by me, creep-o.
and this is the SED command:
sed 1,2!s/./X/g inputfile
the result is:
Hello, how are ya?
What business is it of yours how I am?
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXX
The substitution command uses the period as a "wildcard" meaning
*any* single character. The instruction says: "everywhere on each
line *except* lines 1 and 2, replace any character with a capital
"X."
Note, in this case, how there is no space between the "!"
character and the "s" which begins the substitution command. You
should NOT put a space in there; it's yet another exception (sigh
...) to some of the rules about using spaces in commands which
precede substitution commands. Maybe it's a bug in this version
of SED - beats me. But here you can't have a space to the right
of the "!". If you want to include a space, you have to do it
this way:
sed "1,2 !s/./X/g" inputfile
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 19
and on the command line, double quote marks would have to delimit
the entire substitution instruction, including the "except for"
part. Notice how the space was located: to the *left* of the "!".
As with other address specifications, you can specify "except
for" by reference to text on a line, not merely by line numbers:
sed /how/,/case/!s/./X/g inputfile
Result:
Hello, how are ya?
What business is it of yours how I am?
Well, in that case I don't care how you are.
XXXXXXXXXXXXXXXXXXX
That is: except for lines found to contain "how," and down to the
first line which contains "case" - but no farther than that -
find any character and replace it with "X."
As it turned out the range of lines to be *omitted* from the
operation included all but the last line. And again, you can
combine the two types of "address" instructions:
sed /case/,$!s/./X/g inputfile
Meaning: except for a line containing the word "case" and then
all lines following it, down to the end of the file, substitute
an "X" for any character on each line.
Result:
XXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Well, in that case I don't care how you are.
Fine by me, turkey.
The number of possible ways of noting exceptions in an address
specification is, as you can imagine, pretty vast. Experiment,
experiment ... and remember that the "except for" statement can
include regular expressions, not just simple ones like those used
up to now in the examples.
MODIFIERS FOR THE SUBSTITUTION COMMAND
"g" modifier - global replacement
I've mentioned already that SED will default to making a *single*
replacement only if you do not tell it to replace a string *every*
time it occurs on a line.
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 20
s/hello/HIYA/
If the input file looks like this:
hello goodbye hello goodbye
hello goodbye hello goodbye
The instruction just shown will result in:
HIYA goodbye hello goodbye
HIYA goodbye hello goodbye
But *with* the "g" modifier:
HIYA goodbye HIYA goodbye
HIYA goodbye HIYA goodbye
(Some GNU versions of SED that you might encounter are plagued by
a bug whereby neither the "g" modifier nor the "-g" command-line
switch works at all. The only workaround for it that I know of is
to create a "loop" within a script file. See file SFILES-A.1 for
more about looping.)
Normally when we say "global" in talking about changing a file,
we mean "changes throughout the entire file." That's not the way
to think about it with SED, however. To SED, "global" means
"everywhere on a given line" - which is a lot different, and it's
important to keep the distinction in mind.
Given an instruction like this:
s/hello/goodbye/g
then for all intents and purposes "global" means "everywhere in
the file." But, given an instruction like this:
25,$ s/hello/goodbye/g
then "global" does *not* mean "absolutely, positively everywhere."
Rather, it means: "all occurrences on every line from line 25
down to and including the last line of the file." Big difference.
"p" modifier - PRINT SELECTED LINES
Suppose this is the input file:
hello goodbye hello goodbye
hello goodbye hello goodbye
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 21
and this is the editing instruction:
sed s/hello/HIYA/p
Most versions of SED will give you the following result:
HIYA goodbye hello goodbye
HIYA goodbye hello goodbye
HIYA goodbye hello goodbye
HIYA goodbye hello goodbye
Right: the lines will be *duplicated.* This is because you haven't
told SED to suppress automatic printing via the "-n" switch. So
it will print the file, with changes, *and* will reprint the lines
it changed via the substitution command.
Curiously, the version of SED I've included *doesn't do this*.
Strange! If it's a bug, I don't object to it. (Maybe it's the
*other* versions of SED which are buggy. Beats me.) However, who
knows if it'll *always* behave that way? As always: experiment
before you over-write a source file!
I'll pretend for a moment that there are situations in which the
SED version I've passed along will duplicate lines when the "p"
modifier is used as shown above. Here is where the "-n" switch
can come in handy - in a command like:
sed -n s/hello/HIYA/p inputfile
the effect would be to print *only* the lines that had changed -
not the entire file. If you want to print the whole file,
including the changed lines, do not use either the "-n" switch or
the "p" modifier.
In general, "p" in a substitution command is not very meaningful
or useful unless the "-n" switch has also been used on the
command line.
Combining "g" and "p": "p," so far, has changed only the first
occurrence of "hello" on each line. If the command were:
sed -n s/hello/HIYA/gp inputfile
the result on each line would be:
HIYA goodbye HIYA goodbye
as you'd expect from the use of the "g" modifier.
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 22
*** Note: SED *demands* that if you are going to combine the "g"
and "p" modifiers this way, *the "g" must come first*. Otherwise,
SED will halt with an error message. (Well, it's not very bright,
you know ...)
The combination of "-n" and "gp" tells SED to print *only* lines
which are changed.
If the input file looks like this:
He said "hello."
She said "goodbye."
Neither said "Aardvark."
Only one said "hello." A sorry "hello" it was.
Last line of the file.
Then the following command:
sed -n s/hello/YUCK/p inputfile
would result in the following, and *only* the following, lines
being printed:
He said "YUCK."
Only one said "YUCK" and a sorry "hello" it was.
The "g" modifier wasn't used; "hello" is changed only once on
each line where it occurs. If "g" is used in addition to "p":
sed -n s/hello/YUCK/gp
The result is:
He said "YUCK."
Only one said "YUCK" and a sorry "YUCK" it was.
"w" modifier - WRITE FILE
Earlier I talked about using the DOS redirection character ( > )
to write output to a new file. When you use it, SED's output will
go straight to the file; you won't see anything on the screen.
But, using "w," you can have it both ways.
Suppose you want to capitalize all occurrences of "hello" in a
file called TEST.DOC, viewing the changes at the same time that
SED creates a new file called NEW.DOC:
sed "s/hello/GOODBYE/w new.doc" test.doc
Or - adding the "global" operator to the substitution command:
sed "s/hello/GOODBYE/gw new.doc" test.doc
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 23
Even though the substitution command itself doesn't contain
spaces or special characters, the part of the command reading "w
new.doc" *does* contain a space - so you need to put the entire
instruction within double quotes. In this case the part reading
"w new.doc" is supposed to remain connected to the substitution
command, as it were.
*** IMPORTANT: In the version of SED included with this
documentation, as with the combination of the "g" and "p"
operators, here the "g" must PRECEDE the "w."
In carrying out this instruction, SED will create NEW.DOC from
scratch if it doesn't now exist or over-write it if it does
exist. Actually, what SED does is first to delete any existing
file by the name NEW.DOC, *then* write it from scratch. An error on
the command line would cause an existing NEW.DOC to be re-
created, initially weighing in at 0 bytes, but then perhaps no
NEW.DOC containing any text would be created. So exercise
caution, there ...
The command:
sed "s/hello/GOODBYE/gw new.doc" test.doc
has an interesting effect. While it causes the entire file to be
printed to standard output, it does *not* write the entire file,
containing changes, to the name NEW.DOC. Rather, it writes only
the *changes* to NEW.DOC. It is therefore equivalent to the
command:
sed -n "s/hello/GOODBYE/gp" test.doc > new.doc
the difference is, the command shown immediately above doesn't
allow anything to be displayed on the screen.
If that isn't what you want, use "w" as a stand-alone instruction
on the command line. Details shortly.
You *can* write to a device name, such as LPT1: (parallel printer)
using the "w" modifier:
sed "s/hello/goodbye/gw lpt1" inputfile
As with writing a file to disk, the effect would be to send *only*
the altered lines to the printer. Note: the version of SED
included with this documentation requires that you *omit* the colon
in the device name. Otherwise it will respond: "Can't open LPT1:"
and terminate at once. If you use "w" as a stand-alone item on
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 24
the command line, SED treats device names much differently (be
patient. I'll get to that). Keeps you on your toes, eh?
That's different from how it works if you use redirection. The
command would be:
sed s/hello/goodbye/g inputfile > LPT1:
DELETION OF TEXT
SED uses the letter "d" to delete lines. Thus:
sed 1d inputfile
would delete the first line of the input file.
sed 1,12d inputfile
would delete lines 1, through/including line 12.
sed 43,$d inputfile
would delete lines 43, through/including the last line of the file.
sed 15!d inputfile
would delete all *but* line 15. Same result as if you'd given this
command:
sed -n 15p inputfile
The command:
sed 23,55!d inputfile
would delete all *but* lines 23 through/including 55. The command:
sed /hello/d inputfile
would delete *only* lines on which the word "hello" occurs. The
command:
sed /hello/,/goodbye/d inputfile
would delete all lines from the first one where "hello" occurred,
down to and including - but no farther than - the first one where
"goodbye" occurred. The command:
sed 1,$!d inputfile
would be downright silly. It says, "*except* for *every* line in the
file, delete all lines in the file. The result would be to print
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 25
everything. Nah, make it easy on yourself. Use "TYPE inputfile"
instead.
You have probably long since fallen asleep, but if you haven't
you might recall my mentioning the special expression:
/^$/
to mean "blank line." SED can rapidly delete blank lines as
follows:
sed "/^$/d" inputfile
The quotes are required to set off the caret mark as a special
character only if you're using 4DOS, or something else which uses
" ^ " as a command separator (see also the information about CED,
above).
If a line consists only of spaces - even just one - SED will not
consider it a blank line. If you want to get rid of it, you'll
need two instructions:
sed -e "/^ \{1,\}/d" -e "/^$/d" inputfile
The first instruction tells SED to find the beginning of a line,
followed by one or more spaces, followed immediately by the end
of the line - and delete any such line. The second instruction
kills lines which really *don't* have any characters in them. (The
\{ \} construction is discussed in the REGEXP.1 file.) You could
also have done it this way:
sed -e "/^ */d -e "/^$/d" inputfile
Again, see REGEXP.1 for more information about the use of the "*"
character.
MODIFIERS WHICH CAN ALSO BE USED AS STAND-ALONE
INSTRUCTIONS ON THE COMMAND LINE
You've already seen some of the following single-character commands
used as modifiers for a substitution instruction. But they can also
be used in their own right on the command line.
Note: they are *not* command-line switches. You do not precede them
with a hyphen. However, when used as additional editing
instructions, in some circumstances they will have to be preceded
by the "-e" switch. Example of a command line in which "w" is
used by itself as an instruction:
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 26
Wrong: sed s/hello/goodbye/g w new.doc inputfile
Right: sed -e s/hello/goodbye/g -e "w new.doc" inputfile
(Because you *have* to put at least one space between "w" and the
output file name, that kind of command requires quote marks.)
"p" - PRINT SELECTED LINES
You've seen how "p" can modify a substitution command. It can
also be used to print only certain lines in a file. Again, using
it *without* the "-n" switch will usually produce undesirable
results.
sed p inputfile
would *double print* every line in the file - including blank lines.
sed -n 1p inputfile
would print *only* the first line of the file.
sed -n 1,15p inputfile
would print *only* lines 1 through/including 15.
sed -n 37,$p inputfile
would print *only* lines 37 through/including the very last line of
the file.
sed -n /hello/,/goodbye/p inputfile
would print beginning from a line containing "hello" down to and
including, but no farther down in the file than, a line
containing "goodbye."
sed -n "/hello/,/goodbye/ p" inputfile
This has the same effect as the example immediately above it, but
there is a space between the end of the address range and the
"p." To tell DOS to keep the whole command together so that SED
can interpret it properly, there you'd have to use the double
quote marks.
You can use any number of ways of specifying addresses in the
file - as discussed above under "LIMITING THE SEARCH." This
includes specifying *exceptions.* Try the following command using
this file for the input:
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 27
sed -n /e/!p inputfile
It will have a most interesting effect. It says: "Print nothing
automatically. Now, print lines *except* those containing a
lower-case "e." Needless to say, you won't see many output lines
on the screen. And, uh, you might not want to over-write the
file.
"q" - WRITE AND QUIT - LIKE, PRONTO
The SED docs I've looked through are quite terse about this one.
As far as I can tell, its sole purpose is to print specified
lines in a file and then quit immediately. This would be useful
if you were trying to print only a few lines from a very long
file, but you stood to wait a fair length of time while SED got
done reading to the end of the file (even though it might long
since have stopped printing lines).
This command supports only a single address specification rather
than a range of lines, and the meaning of the "address" you give
it is different from what has been discussed so far.
sed 25q inputfile
would print lines 1 through 25 of "inputfile" and then
immediately terminate SED. The command:
sed 1,25q inputfile
would be an error and SED would halt immediately with the error
message "Only one address allowed: 1,25q".
If "inputfile" were big, the above command would return you to
the DOS prompt faster than:
sed -n -e 1,25p inputfile
A command like:
sed "/this line/q" inputfile
would tell SED to print from the beginning of the file through
and including the *first* line it encounters containing the
character string "this line" - then it would quit without reading
or printing any of the rest of the file. The command:
sed $q inputfile
would be the same as typing, at the DOS prompt: TYPE INPUTFILE -
but TYPE is much faster. The command:
sed $!q inputfile
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 28
would be amusing for all of about two seconds - until, that is,
you realized I'd just given you another hopelessly silly example
to try out.
SED assumes that "q" alone - no line number (or other location in
the file) specified - refers to line 1 of the file. So:
sed q inputfile
would print only line 1 of "inputfile" to the screen - identical
to the command:
sed 1q inputfile
"w" - "WRITE FILE"
Like "p," this one can be used as a modifier in a substitution
command and also as a standalone instruction on the command line.
You have already seen how, when used in a subsitution
instruction, "w" as a modifier will write *only changed lines* to a
file, even though your command might result in the entire file's
being displayed on the screen.
If this isn't what you want, use "w" as a standalone command:
sed -e s/hello/GOODBYE/g -e "w new.doc" inputfile
There *must* be at least one space between the "w" and the name of
the file you want SED to write. Again, since the second one - the
"write file" instruction - contains a space, it must be delimited
with double quote marks.
As with "w" when it is used to modify a substitution command, "w"
as a stand-alone command will gladly over-write an existing file.
So watch it, there. The following sort of command would be a
disaster:
sed -e s/hello/goodbye/g -e "w new.doc" new.doc
Do *not* use the same input and output file names in a case like
this! You've been warned ...
As before, the source file's name appears *last* on the command
line. Note how, as in other examples, both editing commands must
be preceded by the "-e" command-line switch.
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 29
The latter form of the command writes the *entire* source file,
with changes, to the new file NEW.DOC. Again: If "w" follows "p"
in modifying a substitution command, however, it writes only the
changes. Beats me why they wanna have their syntax this way
(whoever "they" are) ...
"w" will supercede the "-n" switch in terms of what is written to
disk, though it will not have an effect on the screen display as
far as I can tell. The command:
sed -n -e s/hello/goodbye/gp -e "w new.doc" inputfile
would cause a display *only* of the changed lines on the screen.
However, the *entire* file would be re-written to disk as NEW.DOC,
with the changes in it. This is pretty much the *reverse* of how
"w" works when used as a modifier for a substitution command.
You can specify a device name like LPT1 instead of a file name
when you use "w" as a stand-alone command. As before, you must
*omit* the colon, or SED won't be able to send the information to
the printer. It will simply respond: "Can't open LPT1:".
"l" - AN ECCENTRIC "LIST" COMMAND
(That's a lower case "L," not a numeral one)
I have three different versions of SED. Each one behaves in a way
completely different from the others when I give the "l" command.
Here's what the one enclosed with this documentation does:
If you say:
sed l inputfile
The input file scrolls down the screen with each line *duplicated*.
This would be identical to what happens if you give the command
"sed p inputfile," except for one important difference: lines
which extend past column 71 are broken and wrapped to the next
line. If the text on the line ends this way:
and here is the ending of this line.
and if the "d" in ending is in column 72, the "l" command changes
it as follows:
and here is the en\
ding of the line
and here is the en\
ding of the line
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 30
SED puts a \ character at column 73 and wraps the remainder of
the text to the next line down. If the command is given in
conjunction with the "-n" command-line switch, the double-
printing of lines doesn't occur; this odd word-wrapping does
occur, and you do see the results on the screen despite the "-n"
switch.
If CTRL characters appear in the file, SED represents some of
them non-literally, via numbers, as in: \001 for CTRL A, \002 for
CTRL B, and so on. The bad news is, this version of SED doesn't
properly represent all of the CTRL characters. Wonderful ...
If you try to get it to represent a highbit character via numeral
this way, with the "l" switch, the program prints all kinds of
bizarre stuff having nothing to do with the character. Would you
believe: the string "\batch\whz.bat" substituted for a highbit
character? There is indeed a file on my hard disk called WHZ.BAT,
and it's in C:\BATCH. How did SED get that file name when I'm
logged onto my *RAM* disk when doing these tests? How did it know
about the existence of C:\BATCH? Who knows? Who cares?
Are we having the fun part yet?
Conclusion: when it comes to symbolically representing CTRL or
highbit characters, this "feature" is a complete bust, though it
does work properly in GNU's version of SED.
So "l" might be useful in dealing with extra-long lines in a
file. What you'd then do with the oddly word-wrapped lines, I
don't know. This SED version does seem to be able to break lines
no matter how long they are, wrapping them at column 72. In my
limited tests of the feature, it has worked with lines as long as
3500 characters.
"G" - APPEND "NEWLINE CHARACTER"
The "G" command (note: this one *must be capitalized*) is an
instruction most often used in SED script files. Alas, it's one
of the ones which is so badly explained in the SED docs I have, I
really haven't been able to figure out just what to do with it,
nor with other commands of its type.
By "type" I mean commands which manipulate SED's work space *and*
its hold space - the hold space being a second memory buffer into
which you can tell SED to "push" information.
The terse explanation for "G" is: Append a "newline" character
(CR plus LF) to the work space. Then, append to the work space
the present contents of the hold space." Remember, as SED reads
each new line, the work space and hold space are empty. After
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 31
reading the line, SED pushes its contents, *not* including the line
boundary characters themselves, into the work space.
"G" adds an additional newline character into the work space. If
you haven't already told SED to "push" some information into the
hold space before you give the "G" command, then the hold space
has nothing to offer. Therefore, the work space contains the
current line's contents, plus an extra "newline" character. This
is about the only use I've found for "G" so far:
sed G inputfile
will rapidly double-space the file. "G" without any address
specification, then, means the same as: 1,$G. I dunno ... I
suppose this kind of command could come in handy some time. The
command:
sed 1,30G inputfile
will double-space from line 1 through line 30 - and then print
the rest of the file with whatever spacing it now has. The
command:
sed 45,$!G inputfile
would add an extra line below every line *except* lines 45
through/including the last line of the file. The command:
sed 30G inputfile
Adds an extra line following line 30 and only following line 30.
"=" - LIST WITH LINE NUMBER
Here's another peculiar one, and again the three different SED
versions I have treat it differently.
sed = inputfile
lists all lines in the file, with their numbers (artificially)
inserted right above each line. Typical kind of display:
1
Here's the first line.
2
Here's the second line.
3
Here's the third line.
With the no-print switch (i.e.: "sed -n = inputfile"):
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 32
1
2
3
The command accepts only a single address, which could be either
a line number, expression, or regular expression. This:
sed /second/= inputfile
would result in:
Here's the first line.
2
Here's the second line.
Here's the third line.
The command:
sed -n /second/= inputfile
would get you:
2
In other words, it's a way to print out all the line numbers
containing specific text. The command:
sed -n $= inputfile
would simply display the last line number for the file. It could
be useful for determining how many lines there are in a group of
files. Here's one instance in which you can use an ambiguous file
name:
sed -n $= input*.*
SED would behave as if all the files had first been concatenated
into a single file. It would read to the end of this artificial
new, larger file, and tell you the ending line number - the total
of all line numbers, that is.
Well ... it's limited. I leave it to your own particular or
peculiar genius to figure out how to make good use of it. Doesn't
seem, uh, overwhelmingly useful to me. But I've been wrong
before. There was this one time back in, I think, 1957 ...
"y" - TRANSLATE CHARACTERS
The "y" command, like the "s" command, changes characters - but
in a very different way from "s".
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 33
y/e/X/ inputfile
would transform every lower case "e" in the file to a capital
"X." This command requires that there be the same number of
characters on the "search-for" side of the instruction as there
are on the "replace-with" side. It does not appear to accept any
kind of regular expression on the search-for side.
y/a1R/.!3/
would find all occurrences of "a" and change them to "."; all
occurrences of "1" and change them to "!"; all occurrences of "R"
and change them to "3". In other words, the command finds what is
in the first position to the left and changes it to whatever is
in the first position to the right - and so on.
The command doesn't appear to take any modifiers - not in this
version of SED, anyway - although you can specify locations
within the file by the usual and accustomed methods, e.g.:
50y/e/X/
/hello/,/goodbye/y/1/0/
and like that.
It could be useful on occasion, I suppose, but better you should
find a DOS implementation of the UNIX TR utility, which has a lot
more going for it. For instance, the command:
tr -s " " filename
will strip a file of duplicated spaces so fast it'll make your
head spin. TR also has the capability to do translations, just as
SED's "y" command does, but TR is a good deal more flexible.
"y" doesn't appear to me to be noticeably faster than a
substitution command, but what would *I* know, anyway?
CONCLUSION
Wake up. It's time for your sleeping pill.
As verbose as this has been, it is far from complete. As I said
earlier, 'way back there in the Paleolithic (the top of this
file), as I learn more about SED I will update this documentation
and send it to a few bulletin boards, where it can either
languish in sorrow or be snapped up by hordes of fanatical SED-
ites. Whatever.
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 34
QUICK REFERENCE
Command line:
sed [switches] [editing commands] input file name
Switches:
-n don't print anything unless told to do so
-e the next item on the commmand line is an
editing instruction
-f the next item on the command line is the
name of a script file. Script file name cannot
contain wildcards
-g Treat every subsitution command as if
were modified by the "g" character - whether
it is or not.
input file name
(can contain path information. Probably
won't work out too well if it contains
wildcards, however.)
MAJOR INSTRUCTION - substitution (s)
s/search/replace/ Find "search"; replace w/ "replace"
s/search// Find "search" and delete it
Substitution command modifiers:
g "global" - everywhere on a line
p print changed lines
w fn write to file named "fn"
MAJOR INSTRUCTION - deletion (d)
/text/d delete lines containing "text"
nd delete line "n" (a numeral)
a,bd delete lines from number "a" to number "b"
/x/,/y/ delete lines from one containing text "x" to
one containing text "y"
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 35
/^$/d used to delete blank lines
$d delete last line of file
MAJOR INSTRUCTION - print (p)
(usually not too useful unless the "-n"
command-line switch has also been used)
/text/p print "text"
np print line "n" (with "n" being a numeral)
a,bp print from line "a" through line "b"
/x/,/y/ print from first line found to contain text
"x" to first line found to contain text "y"
MAJOR INSTRUCTION - write file (w)
w filename to write to file "filename"
MAJOR INSTRUCTION - quick print, then quit (q)
q address write from line 1, down to/including
a line described by "address" - then quit
at once ("address" could be line number or
text)
MAJOR (?) INSTRUCTION - mystery "list" command (l)
l filename For weird and marginally amusing results:
wraps lines that extent past column 72;
doesn't properly represent CTRL characters.
MAJOR INSTRUCTION - stuff newline character into work space (G)
G Add a new line below *every* line in the file.
a,bG Add a new line below *every* line falling within
the range of from line "a" through line "b" -
address could be line number, or text.
MAJOR INSTRUCTION - print line numbers (=)
= 5= would print entire file, printing a "5"
above line 5. Wowie zowie.
-n $= Would print only the last line number for
the file. Slightly more useful.
MAJOR INSTRUCTION - translate characters (y)
y/abc/def/ Transform "a" to "d," "b" to "e," and "c" to "f"
( E N D )
File U-SED-IT.1 - about SED - Copyright 1990 Mike Arst Page: 36
Very nice! Thank you for this wonderful archive. I wonder why I found it only now. Long live the BBS file archives!
This is so awesome! 😀 I’d be cool if you could download an entire archive of this at once, though.
But one thing that puzzles me is the “mtswslnkmcjklsdlsbdmMICROSOFT” string. There is an article about it here. It is definitely worth a read: http://www.os2museum.com/wp/mtswslnk/