I never learned sed
or awk
. Or even Perl. But I'm pretty good with Java's regex, and I'm familiar with the new text formatting facilities in Java 5.
So rather than tricking myself into learning sed
and awk
, I wrote my own stream processor that uses Java's regex and pattern syntax:
_jessewilson$_ **Rip.java**
Usage: Rip [flags] <regex> <format>
regex: a Java regular expression, with groups
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html
you can (parenthesize) groups
\s whitespace
\S non-whitespace
\w word characters
\W non-word
format: a Java Formatter string
http://java.sun.com/javase/6/docs/api/java/util/Formatter.html
%[argument_index$][flags][width][.precision]conversion
'%s', '%1$s' - the full matched text
'%2$s' the first (parenthesized) group
Use 'single quotes' to prevent bash from interfering
flags:
--skip_unmatched: ignore input that doesn't match <regex>
-s:
--newline <text>: use <text> to separate lines in output
-n <text>:`</pre>So it takes Java regexes in, finds matching groups in parenthesis, and then spits those back out using String.format. Here's some examples:<pre>`
_jessewilson$_ echo "7278 ttys001 0:00.66 ssh jessewilson.publicobject.com" |
**Rip.java 'ssh.*' '%s'**
ssh jessewilson.publicobject.com
_jessewilson$_ echo "http://publicobject.com/glazedlists/ Glazed Lists Homepage" |
**Rip.java 'http://([\w.]+)\S*\s+(.*)' '%3$s: %2$s'**
Glazed Lists Homepage: publicobject.com`</pre>These examples are certainly the tip-of-the-iceberg. I suspect I'll be using this tool to munge output from many processes into the input for many other processes.
### Try Rip Out
Download [Rip.java](http://publicobject.com/publicobject/rip/Rip.java), make it executable (`chmod a+x Rip.java`) and put it somewhere on your path. In what is almost certainly more clever than useful, I hacked it up so the uncompiled source can be executed directly by Bash:
<pre class="prettyprint">`/*bin/mkdir /tmp/rip 2> /dev/null
javac -d /tmp/rip $0
java -cp /tmp/rip Rip "$@"
exit
*/
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class Rip {
...
}
Replace my clever hack with a .class
and wrapper script if you'd prefer.