HTTP content negotiation on AWS CloudFront

HTTP content negotiation is a mechanism by which web servers consider request headers in addition to the the URL when determining which content to include in the response. A common use cases for this is response body compression, wherein a server may decide to gzip the content if the request arrived with an Accept-Encoding: gzip header.

Support for content negotiation in HTTP servers is a mixed bag. Apache provides good built-in support for this. NGINX does not offer anything comparable although a rough approximation is possible via configuration directives. Unfortunately I can’t find documentation on IIS. AWS S3 static website hosting, which is used to serve this blog, provides no facility for this whatsoever.

Over the past few years, CDNs have evolved to help address this problem in a few days.

Most CDNs can compress content on the fly, even if the origin only serves uncompressed. Support for gzip is de rigueur, with CloudFront supporting Brotli as well. In practice, however, this can be limited. For example, AWS CloudFront won't compress anything under 1KB or over 10MB. In addition, compression is typically more effective the more CPU you spend on it though this effect is non-linear. For example, running gzip at level 9 can produce content that is 10s of percent smaller than level 1, but requires several times the processing power. As a result, CDNs are typically configured to run at fairly low optimization levels.

Recently CDNs have also begun to allow applications to run business logic at the edge. CloudFlare workers, AWS Lambda@Edge and Fastly VCL are all examples of this.

Felice Geracitano had the clever idea to use Lambda@Edge on AWS CloudFront to implement a bare bones content negotiation scheme for the purpose of supporting Brotli. While there are some issues with his implementation, the concept of performing content negotiation in JavaScript on the CDN and using the result to drive fetching a different resource from the origin is a powerful one.

What does a good solution for this look like?

  • The origin server does not need to support content negotiation. This is both cheaper to operate and allows for offline processing of assets (e.g. to compress using gzip level 9 or Zopfli).
  • The content negotiation process should respect quality factors in the HTTP request headers, e.g. Accept-Encoding: gzip, br;q=0.9 indicates that if content encodings for both gzip and br are available, it prefers gzip.
  • The CDN should serve content with a correct Vary header to ensure that downstream caches are not confused by the content negotiation process.
  • The results of JavaScript content negotiation logic should be cached by the CDN, and only re-executed on CDN cache misses.

Below is an implementation of this for AWS CloudFront, and is being used to handle traffic to https://blog.std.in. The code is MIT licensed and is derived from pgriess/http-content-negotiation-js.

How does this work?

The origin for https://blog.std.in/ is an S3 bucket configured for static website hosting. There are 3 different versions of each piece of content -- one un-processed, one compressed with gzip, and another compressed with Brotli. The compressed content lives in a shadow directory hierarchy under /gzip and /br respectively, allowing the path for compressed content to be computed by prepending the requisite directory.

There are two handlers -- an origin request handler and an origin response handler. There are no viewer handlers, allowing CloudFront to skip this logic entirely when serving a cache hit. The origin request handler performs the content negotiation, parsing the Accept-Encoding header and comparing its requirements with what’s provided by the S3 bucket serving as the origin. It selects the best match and updates the URI to fetch from the origin. The origin response handler sets a Vary: Accept-Encoding header on the response indicating that content was negotiated based on the value of the Accept-Encoding header. The resulting response is then cached in CloudFront.

Finally, the CloudFront distribution is configured with the “Cache Based on Selected Request Headers” setting set to include Accept-Encoding. This has the effect of CloudFront incorporating the browser’s Accept-Encoding header in its cache key when looking up a response. In addition, this prevents CloudFront from stripping the browser’s Accept-Encoding header before the origin request handler has a chance to execute.

A bug in CloudFront?

It is surprising to me that the Vary header is not being added automatically by CloudFront as enabling “Cache Based on Selected Request Headers” and adding Accept-Encoding explicitly indicates that the content for the given URL may vary by the value of this header. This seems like a pretty clear indication that CloudFront should be adding a Vary: Accept-Encoding header to the response automatically.

Pages

Categories

Tags