PUBLIC OBJECT

How do HTTP caching heuristics work?

Suppose you’ve requested a webpage with a Last-Modified date and no other caching headers:

HTTP/1.1 200 OK
Last-Modified: Tue, 16 Dec 2014 06:00:00 GMT

...

The HTTP client will happily store this response in the cache indefinitely. But it won't serve it from the cache unless it’s still fresh at the time of request.

How do we decide whether it’s still fresh? There are three timestamps we’re interested in:

  • Last requested at: Timestamp when we made the last request.
  • Last modified at: What the Last-Modified header said on that response.
  • Now: Timestamp at the time of the current request.

We use the time between last requested at and last modified at to estimate how frequently a document is edited. If a page was modified 5 minutes before the request, it’s assumed to be frequently modified. If it was last modified 5 years before the request, it’s assumed to be infrequently modified.

A page is fresh for 10% of that duration: 10% of 5 minutes is 30 seconds; 10% of 5 years is 6 months. A page is considered fresh until that 10% has elapsed since the document was last requested.

An example.

On March 26, 2015 we download a document that was last modified December 16, 2014, exactly 100 days earlier. We store that document in the cache.

Then 9 days later on April 4, we request the same document again. The cache is used, because the document is considered fresh for 10 days after the requested at date.

Then 2 more days later on April 6, we request it for a third time. This time the cache isn’t used because we’re 11 days in on a 10 day freshness lifetime.

But wait! Because of conditional caching, the HTTP client will include an If-Modified-Since header in the request, like so:

GET / HTTP/1.1
If-Modified-Since: Tue, 16 Dec 2014 06:00:00 GMT
...

If the document still hasn’t changed, the server can return a very small 304 Not Modified response, which instructs the client to serve the cached response.

The nice thing about the 304 Not Modified response is that it impacts the last requested at timestamp. Our cached response is now 111 days old at time of request, and will be considered fresh for another 11.1 days.