PUBLIC OBJECT

I Love Control Flow

As discussed previously, I’m working on a file system API for Okio. I’ve already written real & testing implementations, and now I’m adding ZipFileSystem: it views a zip archive as a read-only file system.

Most of the work is reading specs, writing tests, and learning the gotchas in this ubiquitous-but-creaky file format.

But sometimes programming projects like this carry intricate little puzzles to solve to complete a mission. These programming challenges are a lot like the puzzles in Zelda or Uncharted videogames: working on them feels creative and solving them is very satisfying.

Zip Directory Tree

My zip work presented a nice challenge today. The input is a list of ZipEntry objects representing the files and directories in a zip archive. Here’s the ZipEntry class:

class ZipEntry(
  /** A slash-delimited path like `/restaurant/menu.txt`. */
  val path: String,

  /** Offset of this entry in the zip archive. Null for directories. */
  val offset: Long? = null,

  /** Last modified date of this file or directory, if known. */
  val lastModified: Instant? = null
)

I already had a function that returns a path’s parent if it exists.

/** Returns the path of the enclosing directory, or null if there isn't one. */
fun String.parentPath(): String? {
  if (this == "/") return null
  return when (val lastSlash = lastIndexOf("/")) {
    0 -> "/"
    else -> substring(0, lastSlash)
  }
}

The output is a tree of ZipNode objects representing the entries in a tree.

class ZipNode(
  val path: String,
  val offset: Long? = null,
  val lastModified: Instant? = null,
  val children: List<ZipNode>
)

Finally there’s a couple gotchas:

  • Directories may be present in the zip archive (to track the last modified date), or they may be omitted (to save space). If a directory is absent, use null for the last modified date.
  • Entries are in no particular order.

So my challenge was to implement this function:

/** Returns the root of the tree. */
fun buildTree(entries: List<ZipEntry>): ZipNode {
  ...
}

The solution needs to work, be efficient (minimize I/O, memory, and CPU) and be readable (minimize code density).

Data Structures & Algorithms

I really enjoyed writing this code! I spent more time than I should have trying to fix a duplicated statement and ultimately left it in. I also took naughty shortcuts to save allocations.

If you think this problem sounds fun, try solving it! Here’s mine. (My API signature is uglier than what’s above but the structure is the same).