SED Scripts

Here are some scripts that are really helpful. Enjoy!

insert after column [000606]
Task: Insert a character/string after the n-th column in every line.
Example:
	IN:
	1234567890123456
	abcdefghijklmnop
	The quick brown

	OUT:
	123|456789|01|23456
	abc|defghi|jk|lmnop
	The| quick| b|rown
Idea: Use '^' to anchor the line at the beginning and use '.' to match each character within the line. Group the characters with "\(\)" pairs and use back references in the replacement ("\1", "\2", etc); insert the additional characters/strings between the back references.
Solution: sed -e 's/^\(...\)\(......\)\(..\)/\1\|\2\|\3\|/' If you don't want to count the dots the you can use "\{n\}" for an exact repetition of a single dot: sed -e 's/^\(.\{3\}\)\(.\{6\}\)\(.\{2\}\)/\1\|\2\|\3\|/'
Problem posed by Fred Distenfeld Fred_Distenfeld@lnotes5.bankofny.com [000605]

8bit to HTML
Task: Convert all (well, most) "8bit characters" aka "ISO characters" (ASCII 128-255) to their equivalent in HTML.
8bit2html.sed
Author: Xose Ramos in98xora@mikkeliamk.fi [980421]

Extracting IP numbers
IP numbers are "number addresses" of hosts. They typically appear in some log files of server/client programs. Extracting these from the logs is a frequent task of many programmers. IP numbers consists of four numbers with 8bit values (0-255), separated by (three) dots.
Examples: 127.0.0.1 - the default address of the "localhost".
sed -n 's/[^0-9]*\([0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\).*/\1/p' infile > outfile
	string		read as		explanation
	s		"substitute"
	/		delimiter	start of pattern
	[^0-9]*		non-digits	need not be anchored to '^';
					use of ".*" would take away
					from the following digits!
	\(		"start group"	- start of IP number
	[0-9]\{1,3\}	a number	at least one, at most three digits
	\.		a literal dot	seperating the numbers
	[0-9]\{1,3\}	a number	at least one, at most three digits
	\.		a literal dot	seperating the numbers
	[0-9]\{1,3\}	a number	at least one, at most three digits
	\.		a literal dot	seperating the numbers
	[0-9]\{1,3\}	a number	at least one, at most three digits
	\)		"end group"	end of IP number
	.*		"everything"	this should match the rest of the line
	/		delimiter	separates input pattern from
						substitution pattern
	\1		first group	references the matched first group
					this should contain the IP number
	/		delimiter	separates input pattern from
	p		"print"		prints the output pattern (default)
Result of email with Adam Brothers abrothers@comfax.com [980121]

Mail Folder Weedout
Do you keep a mail log? Do you read all those "Received:" lines when looking at mails? No? Then you might as well do away with them! Here is the script:

# 950614, 950623, 951018, 951020 # Purpose: Weeds (deletes) unwanted header lines from "folders" # (text files containing emails) # Installation: Save this script as "weedout.sed". # Usage: sed -f weedout.sed folder > folder.weeded # :again /^Received:/{ N s/^.*\n// :blah /^[ ]/{ N s/^.*\n// b blah } b again } # # Comment the lines which you want to keep with a "#" # NOTE: Case of characters matters! # /^Approved:/d /^Content-.*:/d /^Distribution:/d /^Errors-to:/d /^Errors-To:/d /^Full-Name:/d /^In-Reply-To:/d /^Lines:/d /^Message-ID:/d /^MIME-Version:/d /^Message-Id:/d /^Mime-Version:/d /^NNTP-Posting-Host:/d /^Organisation:/d /^Organization:/d /^Path:/d /^Phone:/d /^Post:/d /^Precedence:/d /^Received:/d /^References:/d /^Return-Receipt-to:/d /^Return-Receipt-To:/d /^Reply-To:/d /^Resent:/d /^Return-Path:/d /^Sender:/d /^Sent:/d /^Status:/d /^Supercedes:/d /^Supersedes:/d /^Telephone:/d /^X-[a-zA-Z-]*:/d # /^X-Face:/d # /^X-Location:/d # /^X-Mailer:/d # /^X-Status:/d # /^X-Sun-Charset:/d # /^X-URL:/d # /^X400/d # End of script

ELM filter log weedout
This script changes the text that ELM's "filter" program writes to its logfile so that you will save some bytes. I got annoyed by the long strings "Mailing message to" and "Message saved in folder" and by the slashes in the date format, so I did away with them. Also, my home directory was always written in full length so I cut it down to the name of the directory that contains the inboxes (folders where incoming mail is filtered to).

Here it is:

s/^filter (\(.*\) guckes): Mailing message to/\1/ s/^filter (\(.*\) guckes): Message saved in folder/\1/ s/\/home\/emailer\/guckes\/NEW\/// s/.new$// s/0-// s/guckes/IN/ s/^\(..\)\/\(..\)\/\(..\)/\3\1\2/

LaTeX Umlauts to HTML umlauts
Ever wanted to convert a LaTeX text to HTML? Some of the available scripts do not convert the umlauts - so I had to find a quick solution to this problem.

Again, this is fairly simple:

s/\\"a/\ä/g s/\\"A/\Ä/g s/\\"o/\ö/g s/\\"O/\Ö/g s/\\"u/\ü/g s/\\"U/\Ü/g s/\\\{ss\}/\ß/g