Public Object: Coding in the small with Google Collections: AbstractIterator

Coding in the small with Google Collections: AbstractIterator

Part 17 in a Series.

I really like the Java Collections API. So much so, that I use 'em when I'm doing work that isn't particularly collectioney. For example, I recently wrote a quick-n-dirty app that rewrote some files line-by-line. Instead of using a Reader as input, I used an Iterator<String>. The easiest way to create such an iterator is to load the entire file into memory first.

Before:

  public Iterator<String> linesIterator(Reader reader) {
    BufferedReader buffered = new BufferedReader(reader);
    List<String> lines = new ArrayList<String>();

    try {
      for (String line; (line = buffered.readLine()) != null; ) {
        lines.add(line);
      }
    } catch (IOException e) {
      throw new RuntimeException(e);
    }

    return lines.iterator();
  }

That code is simple, but inefficient. And it won't work if the file doesn't fit into memory. A better approach is to implement Iterator and to read through the file on-demand as the lines are requested. Google Collections ' AbstractIterator makes this easy. Whenever a new line is requested, it gets called back to read it from the stream.

After:

  public Iterator<String> linesIterator(Reader reader) {
    final BufferedReader buffered = new BufferedReader(reader);

    return new AbstractIterator<String>() {
      protected String computeNext() {
        try {
          String line = buffered.readLine();
          return line != null ? line : endOfData();
        } catch (IOException e) {
          throw new RuntimeException(e);
        }
      }
    };
  }

This class is really takes the fuss out of custom iterators. Now it's not difficult to create iterators that compute a series, process a data stream, or even compose other iterators.

# posted by Jesse Wilson on Wednesday, August 13, 2008

The only problem I have with this code is that it converts any typed exception thrown to an untyped one. It makes you handle it explicitly in the code that uses this iterator. I think this can lead to more coding errors.
But then, as they saying goes, with power comes responsibility... :)

# posted by Madhat on August 13, 2008 2:32 AM

This is very cool, I love this series.

One thing that is a bit cumbersome for me is that AbstractIterator does not implement Iterable. So I often find myself having to manually add implements Iterable<String> to the signature, as well as implementing the iterator() method (returning this).

I do this when I am doing something like

for(String line : new LinesIterator(in)) {
//process line...
}

Maybe it is a bad practice in general, to make an iterator iterable. Any better solutions for this situation?

# posted by Sam Beran on August 15, 2008 11:23 AM

Cool stuff, again. Congratulations.

During the (great) Google Collections' talk, Kevin talks about reading a stream from Bigtable through Iterables. I wondered if he was just exemplifying a possible use of Iterables or if you really have such kind of java interface to Bigtable.

# posted by edward on August 18, 2008 8:53 PM

I know that you prefaced this with "quick-n-dirty", but in the long run, wouldn't this approach leak open files? (Ditto for any iterable source that needs to be bracketed by calls to open/close.)

If your source needs to be closed after iteration, any thoughts about use cases that don't finish iterating, like...

String firstLine = linesIterator(fileReader).next();

...and how to plug those leaks?

# posted by mk on September 4, 2008 4:34 AM

mk, yeah it's definitely not what you want in a long-running application. Finalization might be a reasonable option here - when the iterator goes out-of-scope, make sure the file is closed.

# posted by swankjesse on September 4, 2008 8:25 AM