Stream Editor


sed (short for S tream ED Itor “stream editor”) is, as awk , a computer program for applying different predefined transformations to a sequential stream of textual data. Sed reads input data line by line, modifies each line according to rules specified in a clean language (called “script sed”), and then returns the contents of the file (default). Although originally written for Unix , by Lee E. McMahon  (en) in 1973 / 1974 ( Bell Labs ), sed is now available on virtually all operating systems have to

Presentation

Sed is often described as a non-interactive text editor . It differs from a conventional editor in that the sequence of processing of the two necessary flows of information (data and instructions) is reversed. Instead of taking the editing commands one by one to apply them to the entire text (which must be completely in memory), sed only scans the text file once, applying all Edit commands on each line. Since only one row at a time is present in memory , sed can handle files of arbitrary size.

Principle of operation

The set of sed commands is based on that of the ed editor . Indeed, the majority of orders operate in a similar way, despite the inversion of the paradigm. For example, the command 25dmeans if it is line 25, then erases it (ie, does not return it to the output) , instead of going to line 25 and erasing- La , as ed executes it. Notable exceptions are the copy and move commands, which apply to a range of rows, and therefore have no direct equivalent in sed. Instead, sed introduces an additional buffer called hold space , as well as commands to handle it.

For example, the ed command to copy line 25 to line 76 ( 25t76) could be encoded into two separate commands in sed ( 25h;76g). The first memorizes the line in the hold space , the second recovers it when it is time.

Use

The following example shows a usual use of sed:

Sed -e 's / Old / New / g' filenameInfo> filename Output

The command smeans substitute (“substitute”). The flag gmeans global , indicating that all occurrences in each row should be replaced. After the first character /is given a regular expression that sed must find. After the second /is specified the expression replacing what he found. The substitution command ( s///) is by far the most powerful and most frequently used sed command.

A possible application of the substitution is the correction of a recurring spelling mistake, or the replacement of all occurrences of an acronym. For example :

Sed -e 's / fileir / file / g' sed.wikipedia> sed.wikipedia.correct
Sed -e 's / WP / Wikipedia / g' sed.wikipedia> sed.wikipedia.correct

You do not have to use ” / ” as a delimiter. The ” # , - . ” characters (non-exhaustive list) can also be used to avoid the “anti-slash” accumulation of “protection” making the syntax difficult to read. The following commands are equivalent:

S / \ / one \ / path \ / to / \ / one \ / other \ / string /
S # / one / path / to # / one / other / string #

Sed can also be used to number the lines of a file, delete the HTML tags of a text, etc. However, on some platforms, the Perl language is often used instead of sed.

On Unix, sed is often used as a filter in a tube :

Production_data | Sed -e 's / x / y /'

In this example, data is generated and then modified on the fly by replacing the first occurrence of x with a y . For a replacement of all occurrences of x the syntax to use is:

Production_data | Sed -e 's / x / y / g'

Several substitutions, or other commands, can be grouped into a file, called the sed script ( subst.sedin the following example), and then applied to the data:

Sed -f subst.sed filenameInput> filename Output

In addition to substitutions, other types of simple treatments are available. For example, the following script removes blank lines or spaces that contain only spaces:

Sed -e '/ ^ * $ / of filename

This example uses metacharacters to form regular expressions :

Meta-character Correspondence
^ Corresponds to the beginning of a line (just before the first character)
$ Corresponds to the end of a line (just after the last character)
. Matches any single character
* Corresponds to one or more occurrences of the preceding character
[ ] Corresponds to any of the characters cited between the brackets

Sed also allows the use of fields to identify certain parts of a string. A field is defined by a regular expression identified by the tags \(and \)the field can then be used with \nwhere nthe field number is (the number of fields is limited to 9). Example: Swap two fields separated by a dash:

S / \ (. * \) - \ (. * \) / \ 2- \ 1 /

Another example: display the last N characters of a string:

S / \ (. * \) \ (. \ {N \} \) / \ 2 /

Two fields are defined: \1which will contain the first characters and \2which will contain the last N characters. Braces in the regular expression that specify the number of occurrences {N}must be protected by an anti-slash.

Programming

Complex structures are possible with sed, insofar as it can be assimilated to a very specialized but simple programming language . For example, thread control can be managed using labels ( labels , label: name tracking) and the branch instruction b. A statement bfollowed by a valid label name will transfer the execution thread to the block following the label. If the label does not exist, then the statement completes the script.

Sed is one of the oldest Unix commands that allows the command line to process data files. It evolved naturally as a successor to the famous grep command . Cousin of the youngest Awk , sed allows the shell script to perform powerful and very useful processes. Sed was probably one of the very first tools of Unix to actually encourage the ubiquitous use of regular expressions. Regarding processing speed, sed is generally faster than Perl , and significantly faster than Awk.

Sed and Awk are often cited as Perl ‘s ancestors and inspirers . In particular, the syntax s///of the previous example is an integral part of the Perl syntax.

The sed language has no variable and has only a primitive goto as a flow control instruction. However, it is turing-complete [1]  [ archive ] .

Extensions

GNU sed integrates several new features such as “in place” edition (the original file is replaced by the result of sed processing). In-place editing is often used to replace an ed script . For example :

$ Cat oumina.txt
Ali all albachmohandiss alaska alahoma allo alala talbidot taligh
$ Sed -i 's / al / d / g' oumina.txt
$ Cat oumina.txt
Di dl dba dba dba dd dd ddd ddd ddd ddd dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd

Can be used instead of

$ Ed oumina.txt
1, $ s / al / d / g
W
Q

There is an improved version of sed, called super-sed ( ssed [2]  [ archive ] ), which includes regular expressions compatible with Perl.

Example

The following example shows how sed, which usually processes each row separately, can suppress line breaks from rows whose rows begin with a space.

To illustrate the desired effect, consider the following:

This is my cat
 My cat's name is betty
This is my dog
 My dog's name is frank

The sed command will give:

This is my cat's name is betty
This is my dog's name is frank

Here is the command :

Sed 'N; s / \ n / / g; P; D;

Which can be decomposed in the following way ( ;allow them to separate commands):

N; s ​​/ \ n / / g; P; D;
N also reads the next line
 S / \ n / / g makes the substitution ...
 / \ N / ... of a line break followed by a space ...
 / / ... by a simple space ...
 G ... for all occurrences
 P returns the result of the processing
 D deletes what remains so that it does not show
 The next line twice

More concretely we can group the headers of a file type mbox (for example), by unrolling it and removing the bodies:

#file named headers.sed
#command sed -nf headers.sed / var / mail / my_mbox
: Start #etiquette start
/ ^ From / {# address "^ From"
 NOT; #add a line to the current pattern
 S / \ n \ (\ t \ | \) //;
 /.* \ n $ / {# to ​​addresses \ n $
 P # display the current pattern
 B end #configuration on end
 }
 B start #configuration on start
}
: End # label end