On Files and Okio

Though Okio has been multiplatform for a year, it still lacks support for reading & writing files. I’ve been thinking about what application developers need and what APIs we should build. But the more I consider it, the more I think, ‘fuck, filesystems are awful’.

Files as UI

Computing used to be all about files. Using a computer was an exercise in creating, organizing, collecting, and exchanging files. Every program, document, game, and song was a file. If you had a fancy-enough computer you could have descriptive file names – Lost World Book Report.wpd instead of jurassi2.wpd.

Operating systems had rich UIs for exploring and organizing the filesystem. I used to pore over John Siracusa’s Mac OS X reviews, learning about improvements to Finder and the filesystem features that power it. Did I mention colored files?!

In this era filesystems were also critical to sharing and collaboration. Before Google Docs and Notion, we’d copy our Microsoft Word documents into an organization-wide file server. Corporate data security was implemented through filesystem permissions and access control lists.

These are the use cases for which Java’s filesystem API was designed. The filesystem is supreme and precise control of attributes and metadata is what end users wanted and expected.

Some of the Java Filesystem APIs

Files Become Implementation Details

In the early 2000s I used Winamp to listen to MP3 files, ripped from CDs or FTP’d down over dialup. Winamp was files-oriented: I give it MP3 files and it plays ’em.

Winamp’s successor was iTunes. I found it uncomfortable because it hid the files! I put a CD in the slot, push Import, and the album is added to my library. At no point do I pick a directory or name a file. But if I looked I could find the .mp3 files, named and organized automatically.

iTunes’ successor is Spotify. With Spotify there might be files in there somewhere, but they’re not for me to see. The files are hidden implementation details with names like 00E66999-0AE63026.m4p.

This trend isn’t limited to music:

  • Programs moved from the /Applications folder on my computer to an icon on my phone’s launcher. I cannot send my Clock app as an attachment in an email.
  • Documents moved from local files to shared web apps like Figma and Google Docs. (There’s a file browser in Google Drive, but it doesn’t follow filesystem rules or suffer their limitations. Sharing settings are nothing like chmod!)
  • Digital photos used to be .jpeg files on a camera’s SD card. Now I can take a photo, edit, and share it, all without seeing a filename or a directory.

Software development still requires me to muck with .kt and .class files. But with Dark and Codespaces, it’s trending to leave files behind too.

File I/O Is Bad

My biggest problem with File I/O is that it’s difficult to program correctly. The failure modes are awful: a write could succeed, fail, or partially succeed! A write that appears to have succeeded might not have; I need a rain dance of fsync() commands to earn positive confirmation that my write was durable.

There’s more hazards:

  • Filesystems can disappear while I’m writing them (via removed thumb-drives or NFS on flaky Wi-Fi)
  • Disks fill up. (Did I remember to handle this?)
  • I can race another program, or another execution of my own program!
  • A rename or delete operation could fail because the file is open. That’s not allowed on Windows.
  • File paths may be case-sensitive, case-insensitive, or a mix of both.
  • I leak data by writing a file with the wrong permissions.
  • I compromise myself by reading a file that’s been tampered with.

Despite their general awfulness we still need files in the implementations of our software.

Coping with Data

Here’s my recommendations in order of preference:

  1. Put it in the cloud. Perhaps DynamoDb or Spanner. Make it somebody else’s job to lock it down and back it up. Cloud databases do atomic writes, defend against races, and let you manage access programmatically.
  2. Use a local sqlite database. The sqlite team did the hard work of working around all the filesystem gotchas. It’s easy to use, high performance, and flexible. Plus, SQLDelight makes persistence typesafe.
  3. Do the atomic rename trick. Write a temp file and then renaming it to get atomicity. On Android this trick is implemented as AtomicFile. It’s inefficient for updates: to change a single byte we must rewrite the entire file.

These three choices won’t work for everything. If I ever have to implement a build system or database, I’ll need low-level features like watches and metadata.

Okio + Files

To support files in Okio multiplatform, I’d like to start with the basics:

  • Read a file using a Source
  • Write a file using a Sink
  • Move and delete files
  • Create and delete directories

Is this useful without watches, metadata, permissions, volume management, memory mapping, locking, or symlinks? If files are merely implementation details, it just might be.