Atom Feed SITE FEED   ADD TO GOOGLE READER

Rip.java: stream manipulation for Java programmers

I never learned sed or awk. Or even Perl. But I'm pretty good with Java's regex, and I'm familiar with the new text formatting facilities in Java 5.

So rather than tricking myself into learning sed and awk, I wrote my own stream processor that uses Java's regex and pattern syntax:
jessewilson$ Rip.java
Usage: Rip [flags] <regex> <format>

regex: a Java regular expression, with groups
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html
you can (parenthesize) groups
\s whitespace
\S non-whitespace
\w word characters
\W non-word

format: a Java Formatter string
http://java.sun.com/javase/6/docs/api/java/util/Formatter.html
%[argument_index$][flags][width][.precision]conversion
'%s', '%1$s' - the full matched text
'%2$s' the first (parenthesized) group

Use 'single quotes' to prevent bash from interfering

flags:
--skip_unmatched: ignore input that doesn't match <regex>
-s:

--newline <text>: use <text> to separate lines in output
-n <text>:
So it takes Java regexes in, finds matching groups in parenthesis, and then spits those back out using String.format. Here's some examples:

jessewilson$ echo "7278 ttys001 0:00.66 ssh jessewilson.publicobject.com" |
Rip.java 'ssh.*' '%s'
ssh jessewilson.publicobject.com

jessewilson$ echo "http://publicobject.com/glazedlists/ Glazed Lists Homepage" |
Rip.java 'http://([\w.]+)\S*\s+(.*)' '%3$s: %2$s'
Glazed Lists Homepage: publicobject.com
These examples are certainly the tip-of-the-iceberg. I suspect I'll be using this tool to munge output from many processes into the input for many other processes.

Try Rip Out


Download Rip.java, make it executable (chmod a+x Rip.java) and put it somewhere on your path. In what is almost certainly more clever than useful, I hacked it up so the uncompiled source can be executed directly by Bash:
/*bin/mkdir /tmp/rip 2> /dev/null
javac -d /tmp/rip $0
java -cp /tmp/rip Rip "$@"
exit
*/
import java.io.*;
import java.util.*;
import java.util.regex.*;

public class Rip {
...
}

Replace my clever hack with a .class and wrapper script if you'd prefer.
That's great. I don't have a need for this because I know sed and awk well enough. But I love the idea.
Neat. However, you may want to check out the Groovy command-line options. Not sure if it does precisely the same thing but it does a lot of similar work. Naturally, it uses Java RegEx. See the following page for examples, including usage in a Unix env.

http://groovy.codehaus.org/Groovy+CLI
I used JavaNativeCompiler (http://jnc.mtsystems.ch/index.html, uses gcj under the hood) to compile a standalone windows binary. Now it doesn't even need a jvm to run.