Atom Feed SITE FEED   ADD TO GOOGLE READER

Coding in the small with Google Collections: AbstractIterator

Part 17 in a Series.

I really like the Java Collections API. So much so, that I use 'em when I'm doing work that isn't particularly collectioney. For example, I recently wrote a quick-n-dirty app that rewrote some files line-by-line. Instead of using a Reader as input, I used an Iterator<String>. The easiest way to create such an iterator is to load the entire file into memory first.

Before:

  public Iterator<String> linesIterator(Reader reader) {
BufferedReader buffered = new BufferedReader(reader);
List<String> lines = new ArrayList<String>();

try {
for (String line; (line = buffered.readLine()) != null; ) {
lines.add(line);
}
} catch (IOException e) {
throw new RuntimeException(e);
}

return lines.iterator();
}
That code is simple, but inefficient. And it won't work if the file doesn't fit into memory. A better approach is to implement Iterator and to read through the file on-demand as the lines are requested. Google Collections ' AbstractIterator makes this easy. Whenever a new line is requested, it gets called back to read it from the stream.

After:

  public Iterator<String> linesIterator(Reader reader) {
final BufferedReader buffered = new BufferedReader(reader);

return new AbstractIterator<String>() {
protected String computeNext() {
try {
String line = buffered.readLine();
return line != null ? line : endOfData();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
};
}
This class is really takes the fuss out of custom iterators. Now it's not difficult to create iterators that compute a series, process a data stream, or even compose other iterators.
The only problem I have with this code is that it converts any typed exception thrown to an untyped one. It makes you handle it explicitly in the code that uses this iterator. I think this can lead to more coding errors.
But then, as they saying goes, with power comes responsibility... :)
This is very cool, I love this series.

One thing that is a bit cumbersome for me is that AbstractIterator does not implement Iterable. So I often find myself having to manually add implements Iterable<String> to the signature, as well as implementing the iterator() method (returning this).

I do this when I am doing something like

for(String line : new LinesIterator(in)) {
//process line...
}

Maybe it is a bad practice in general, to make an iterator iterable. Any better solutions for this situation?
Cool stuff, again. Congratulations.

During the (great) Google Collections' talk, Kevin talks about reading a stream from Bigtable through Iterables. I wondered if he was just exemplifying a possible use of Iterables or if you really have such kind of java interface to Bigtable.
I know that you prefaced this with "quick-n-dirty", but in the long run, wouldn't this approach leak open files? (Ditto for any iterable source that needs to be bracketed by calls to open/close.)

If your source needs to be closed after iteration, any thoughts about use cases that don't finish iterating, like...

String firstLine = linesIterator(fileReader).next();

...and how to plug those leaks?
mk, yeah it's definitely not what you want in a long-running application. Finalization might be a reasonable option here - when the iterator goes out-of-scope, make sure the file is closed.