Uncertainty in Tests

I’ve been working on OkHttp’s Happy Eyeballs and exploring testing strategies along the way.

Happy Eyeballs is the fun name of RFC 6555, which is a clever hack to deploy IPv6 even if some client’s IPv6 connectivity is unstable. Here’s how it works:

A client has a list of a server’s IP addresses, containing both IPv6 and IPv4 addresses.
The client orders these to alternate between IPv6 and IPv4 addresses, starting with IPv6.
The client attempts a new connection every 250 milliseconds until any attempt succeeds.

Here’s an implementation that needs testing:

fun happyEyeballs(host: String, port: Int): Socket {
  val future = CompletableFuture<Socket>()
  val executor = Executors.newScheduledThreadPool(0)
  val ipv6Delays = (0L until Long.MAX_VALUE step 500L).iterator()
  val ipv4Delays = (250L until Long.MAX_VALUE step 500L).iterator()

  for (address in InetAddress.getAllByName(host)) {
    val delayMillis = when (address) {
      is Inet6Address -> ipv6Delays.next()
      else -> ipv4Delays.next()
    }

    executor.schedule({
      try {
        val socket = Socket()
        socket.connect(InetSocketAddress(address, port))
        future.complete(socket)
      } catch (ignored: Exception) {
      }
    }, delayMillis, TimeUnit.MILLISECONDS)
  }

  val result = future.get()
  executor.shutdownNow()
  return result
}

But testing this is difficult.

I might test using hostnames that have interesting combinations of IPv4 and IPv6 addresses: localhost, fastly.com, and test-ipv6.com. Such tests will break if those hosts change their addresses or if the Internet is unreachable.

My tests might expect that a known IPv6-supporting host will return an IPv6 address. But there’s a chance that its IPv6 connection will take longer than usual and an unexpected address will win!

Test Strategy: Embrace Uncertainty

Happy Eyeballs is one of many features where production behavior is non-deterministic, racy, and environment-dependent. Production code needs to cope with flaky networks and connections that take longer than usual.

In this testing strategy, we prefer environment-dependent tests because it’s most like production.

The code above is broken if nothing connects. Even if we don’t anticipate this situation, variability in the test environment should discover it for us! We get that test case for free.

Test Strategy: Eliminate Uncertainty

The above code has 3 sources of variability:

Which IP addresses are returned
How long each takes to connect
How the host platform schedules racing threads

We can create fakes for each and write tests that are strictly deterministic.

interface Dns {
  fun lookup(host: String): List<InetAddress>
}

interface SocketConnector {
  fun connect(socket: Socket)
}

interface Scheduler {
  fun schedule(task: Runnable, delay: Long, unit: TimeUnit)
}

With careful use of tools like CountDownLatch (or perhaps a Semaphore in coroutines code), we can choreograph elaborate scenarios. Perhaps we test a host that has only IPv4 addresses, and discover the above code adds an unnecessary delay before the first connect.

Once we fix that bug we can write a regression test to make sure it stays fixed. Should we reintroduce the bug in a future update, the test will catch it.

Another upside of test doubles is that we can speed up time. We don't need 250 milliseconds to test a 250 millisecond delay!

Both Strategies Discover Bugs

Embracing uncertainty exposes our implementation to more scenarios. Using real schedulers and real sockets also gives us confidence that we’re using the APIs properly.

Eliminating uncertainty lets us exercise scenarios that are likely to occur in production but unlikely to occur in test environments. We can be exacting in our expectations of what happens: ‘not only is the IPv4 address chosen, but it takes exactly 300 milliseconds given a 50 millisecond connect delay.’

Test Flakes Feel Bad

What happens when an environment-dependent failure is discovered? On the teams I’ve worked on it goes something like this:

Dave writes an environment-dependent test. It passes on his local machine and again on CI. He gets a green build and merges his change.
A few days later Jenn is working on a different feature. Dave’s test fails on Jenn’s CI run. She’s surprised by the failing test and tries to figure it out.
Jenn (eventually) learns that the test is flaky, feels bad for wasting her time, and asks Dave to fix. She may also warn the rest of the team to re-run tests if they fail.
Dave can’t reproduce the failure locally – it is environment-dependent after all – so he changes things hastily and hopes for the best.

Late Failures Feel Bad

One common source of non-determinism in business code is the current date and time. If your code incorrectly handles the daylight savings time cutover, tests that only fail for one hour each year won’t help! You only find out about the bug when it’s too late.

When these tests discover bugs, it can be after code is merged and possibly after it's released.

Avoid Uncertainty!

My recommendation is to avoid nondeterminism in tests. The cost of the free test cases outweighs the benefits.

See also LinkedHashMap, YYYY, and Toeholds.