Cache-Forever Assets

I originally wrote this to help Stoyan out with Web Performance Calendar; republishing here.

A long time ago, we had a client with a performance problem. Their entire web app was slow. The situation with this client’s app was a bit tricky; this client was a team within a very large company, and often—in my experience, anyway—large companies mean that there are a lot of different people/teams who exert control over deployed apps and there’s a lot of bureaucracy in order to get anything done.

The client’s team that had asked us to help with slow page loads only had passive access to logs (they couldn’t easily add new logging), and they were mostly powerless to do things like optimize SQL queries, of which there were logs already, and really only controlled the web app itself, which was a very heavy Java/Spring-based app. The team we were working with knew just enough to maintain the user-facing parts of the app.

We, a contracted team brought in to help with guidance (and we did eventually build some interesting technology for this client), had no direct ability to modify the deployed app, nor did we even get access to the server-side source code. But we still wanted to help, and the client wanted us to help, given all of these constraints. So, we did a bit of what-we-can-see analysis, and came up with a number of simple, but unimplemented optimizations. “Low-hanging fruit” if you will.

These optimizations included things like “improve the size of these giant images (and here’s how to do it without losing any quality)”, “concatenate and minify these CSS and JavaScript assets” (the app was headed by a HTTP 1.x reverse proxy), and “improve user-agent caching”. It’s the last of these that I’m going to discuss here.

Now, before we get any deeper into this, I want to make it clear that the strategy we implemented (or, more specifically: advised the client to implement) is certainly not ground-breaking—far from it. This client, whether due to geographic location, or perhaps being shielded from outside influence within their large corporate infrastructure, had not implemented even the most basic of browser-facing optimizations, so we had a great opportunity to teach them things we’d been doing for years—maybe even decades—at this point.

We noticed that all requests were slow. Even the smallest requests. Static pages, dynamically-rendered for the logged-in user pages, images, CSS, even redirects were slow. And we knew that we were not in a position to do much about this slowness, other than to identify it and hope the team we were in contact with could request that the controlling team look into the more-general problem. “Put the assets on a CDN and avoid the stack/processing entirely” was something we recommended but it wasn’t even something we could realistically expect to be implemented given the circumstances.

“Reduce the number of requests” was already partially covered in the “concatenate and minify” recommendation I mentioned above, but we noticed that because all requests were slow, the built-in strategy of using the stack’s HTTP handler to return 304 not modified if a request could be satisfied via Last-Modified or ETag was, itself, sometimes taking several seconds to respond.

A little background: normally (lots of considerations like cache visibility glossed over here), when a user agent makes a request for an asset that it already has in its cache, it tells the server “I have a copy of this asset that was last modified at this specific time” and the server, once it sees that it doesn’t have a newer copy, will say “you’ve already got the latest version, so I’m not going to bother sending it to you” via a 304 Not Modified response. Alternatively, a browser might say “I’ve got a copy of this asset that you’ve identified to have unique properties based on this ETag you sent me; here’s the ETag back so we can compare notes” and the server will—again, if the asset is already current—send back a 304 response. In both cases, if the server has a newer version of the asset it will (likely) send back a 200 and the browser will use and cache a new version.

It’s these 304 responses that were slow on the server side, like all other requests. The browser was still making the request and waiting a (relatively) long time for the confirmation that it already had the right version in its cache, which it usually did.

The strategy we recommended (remember: because we were extremely limited in what we expected to be able to change) was to avoid this Not Modified conversation altogether.

With a little work at “build” time, we were able to give each of these assets, not only a unique ETag (as determined by the HTTP dæmon itself), but a fully unique URL, based on its content. By doing so, and setting appropriate HTTP headers (more on the specifics of this below), we could tell the browser “you never even need to ask the server if this asset is up to date. We could cache “forever” (in practice: a year in most cases, but that was close enough for the performance gain we needed here).

Fast forward to present time. For our own apps, we do use a CDN, but I still like to use this cache-forever strategy. We now often deploy our main app code to AWS Lambda, and find ourselves uploading static assets to S3, to be served via CloudFront (Amazon Web Services’ CDN service).

We have code that collects (via either a pre-set lookup, or by filesystem traversal) the assets we want to upload. We do whatever preprocessing we need to do to them, and when it’s time to upload to S3, we’re careful to set certain HTTP headers that indicate unconditional caching for the browser:

def upload_collected_files(self, force=False):
    for f, dat in self.collected_files.items():

        key_name = os.path.join(
            self.bucket_prefix, self.versioned_hash(dat["hash"]), f
        )

        if not force:
            try:
                s3.Object(self.bucket, key_name).load()
            except botocore.exceptions.ClientError as e:
                if e.response["Error"]["Code"] == "404":
                    # key doesn't exist, so don't interfere
                    pass
                else:
                    # Something else has gone wrong.
                    raise
            else:
                # The object does exist.
                print(
                    f"Not uploading {key_name} because it already exists, and not in FORCE mode"
                )
                continue

        # RFC 2616:
        # "HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future"
        headers = {
            "CacheControl": "public,max-age=31536000,immutable",
            "Expires": datetime.today() + timedelta(days=365),
            "ContentType": dat["mime"],
            "ACL": "public-read",
        }

        self.upload_file(
            dat["path"],
            key_name,
            self.bucket,
            headers,
            dry_run=os.environ.get("DRY_RUN") == "1",
        )

The key name (which extends to the URL) is a shortened representation of a file’s contents, plus a “we need to bust the cache without changing the contents” version on our app’s side, followed by the asset’s natural filename, such as (the full URL): https://static.production.site.faculty.net/c7a1f31f4ed828cbc60271aee4e4f301708662e8a131384add7b03e8fd305da82f53401cfd883d8b48032fb78ef71e5f-2020101000/images/topography-overlay.png

This effectively tells S3 to relay Cache-Control and Expires headers to the browser (via CloudFront) to only allow the asset to expire in a year. Because of this, the browser doesn’t even make a request for the asset if it’s got it cached.

We control cache busting (such as a new version of a CSS, JS, image, etc.) completely via the URL; our app has access (via a lookup dictionary) to the uploaded assets, and can reference the full URL to always be the latest version.

The real beauty of this approach is that the browser can entirely avoid even asking the server if it’s got the latest version—it just knows it does—as illustrated here:

Developer tools showing “cached” requests for assets on faculty.com