Rip.java: stream manipulation for Java programmers
I never learnedsed
or awk
. Or even Perl. But I'm pretty good with Java's regex, and I'm familiar with the new text formatting facilities in Java 5.So rather than tricking myself into learning
sed
and awk
, I wrote my own stream processor that uses Java's regex and pattern syntax:jessewilson$ Rip.java
Usage: Rip [flags] <regex> <format>
regex: a Java regular expression, with groups
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html
you can (parenthesize) groups
\s whitespace
\S non-whitespace
\w word characters
\W non-word
format: a Java Formatter string
http://java.sun.com/javase/6/docs/api/java/util/Formatter.html
%[argument_index$][flags][width][.precision]conversion
'%s', '%1$s' - the full matched text
'%2$s' the first (parenthesized) group
Use 'single quotes' to prevent bash from interfering
flags:
--skip_unmatched: ignore input that doesn't match <regex>
-s:
--newline <text>: use <text> to separate lines in output
-n <text>:
So it takes Java regexes in, finds matching groups in parenthesis, and then spits those back out using String.format. Here's some examples:
jessewilson$ echo "7278 ttys001 0:00.66 ssh jessewilson.publicobject.com" |
Rip.java 'ssh.*' '%s'
ssh jessewilson.publicobject.com
jessewilson$ echo "http://publicobject.com/glazedlists/ Glazed Lists Homepage" |
Rip.java 'http://([\w.]+)\S*\s+(.*)' '%3$s: %2$s'
Glazed Lists Homepage: publicobject.com
These examples are certainly the tip-of-the-iceberg. I suspect I'll be using this tool to munge output from many processes into the input for many other processes. Try Rip Out
Download Rip.java, make it executable (
chmod a+x Rip.java
) and put it somewhere on your path. In what is almost certainly more clever than useful, I hacked it up so the uncompiled source can be executed directly by Bash:/*bin/mkdir /tmp/rip 2> /dev/null
javac -d /tmp/rip $0
java -cp /tmp/rip Rip "$@"
exit
*/
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class Rip {
...
}
Replace my clever hack with a
.class
and wrapper script if you'd prefer.