FileCharSequence
adapts a java.io.File
as a CharSequence
which has nice consequences. For example, you can run Java regular expressions directly against a File. And you can easily send part or all of a file to a StringBuilder
or Writer
:
/**
* Adapts a text file as a character sequence so that it can be directly
* manipulated by regular expressions and other character utilities. The
* file may be at most 2 GB in size and encoded with {@code ISO-8859-1};
* otherwise behaviour is undefined.
*/
public final class FileCharSequence implements CharSequence {
...
}`</pre>If you like this, feel free to use [the code](http://code.google.com/p/publicobject/source/browse/trunk/src/com/publicobject/io/FileCharSequence.java) in your projects.
I prefer to use Java for one-off text processing tools. Partly this is because that's what my development environment is already set up to do, and partly it's because I'm not very productive in Python. With that constraint, I've written `Strip.java`. It uses `FileCharSequence` behind-the-scenes to strip all occurrences of a regex from a file. It uses Java's regex syntax, and supports switches like `(?m)` for multi-line regexes. Just like [the Rip.java tool](http://publicobject.com/2008/08/ripjava-stream-manipulation-for-java.html), it can be executed directly from your command line:
<pre>`
jessewilson:~$ **Strip.java**
Usage: Strip <regex> [files]
regex: a Java regular expression, with groups
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html
you can (parenthesize) groups
\s whitespace
\S non-whitespace
\w word characters
\W non-word
files: files to strip. These will be overwritten!
flags:
--clober: overwrite the passed in files rather than creating new ones
-c:
Use 'single quotes' to prevent bash from interfering
This code is also Apache-licensed for your enjoyment. Download Strip.java, make it executable (chmod a+x Strip.java
) and put it somewhere on your path!