Articles in the AWS category

  1. HTTP content negotiation on AWS CloudFront

    HTTP content negotiation is a mechanism by which web servers consider request headers in addition to the the URL when determining which content to include in the response. A common use cases for this is response body compression, wherein a server may decide to gzip the content if the request arrived with an Accept-Encoding: gzip header.

    Support for content negotiation in HTTP servers is a mixed bag. Apache provides good built-in support for this. NGINX does not offer anything comparable although a rough approximation is possible via configuration directives. Unfortunately I can’t find documentation on IIS. AWS S3 static website hosting, which is used to serve this blog, provides no facility for this whatsoever.

    Over the past few years, CDNs have evolved to help address this problem in a few days.

    Most CDNs can compress content on the fly, even if the origin only serves uncompressed. Support for gzip is de rigueur, with CloudFront supporting Brotli as well. In practice, however, this can be limited. For example, AWS CloudFront won't compress anything under 1KB or over 10MB. In addition, compression is typically more effective the more CPU you spend on it though this effect is non-linear. For example, running gzip at level 9 can produce content that is 10s of percent smaller than level 1, but requires several times the processing power. As a result, CDNs are typically configured to run at fairly low optimization levels.

    Recently CDNs have also begun to allow applications to run business logic at the edge. CloudFlare workers, AWS Lambda@Edge and Fastly VCL are all examples of this.

    Felice Geracitano had the clever idea to use Lambda@Edge on AWS CloudFront to implement a bare bones content negotiation scheme for the purpose of supporting Brotli. While there are some issues with his implementation, the concept of performing content negotiation in JavaScript on the CDN and using the result to drive fetching a different resource from the origin is a powerful one.

    What does a good solution for this look like?

    • The origin server does not need to support content negotiation. This is both cheaper to operate and allows for offline processing of assets (e.g. to compress using gzip level 9 or Zopfli).
    • The content negotiation process should respect quality factors in the HTTP request headers, e.g. Accept-Encoding: gzip, br;q=0.9 indicates that if content encodings for both gzip and br are available, it prefers gzip.
    • The CDN should serve content with a correct Vary header to ensure that downstream caches are not confused by the content negotiation process.
    • The results of JavaScript content negotiation logic should be cached by the CDN, and only re-executed on CDN cache misses.

    Below is an implementation of this for AWS CloudFront, and is being used to handle traffic to https://blog.std.in. The code is MIT licensed and is derived from pgriess/http-content-negotiation-js.

    How does this work?

    The origin for https://blog.std.in/ is an S3 bucket configured for static website hosting. There are 3 different versions of each piece of content -- one un-processed, one compressed with gzip, and another compressed with Brotli. The compressed content lives in a shadow directory hierarchy under /gzip and /br respectively, allowing the path for compressed content to be computed by prepending the requisite directory.

    There are two handlers -- an origin request handler and an origin response handler. There are no viewer handlers, allowing CloudFront to skip this logic entirely when serving a cache hit. The origin request handler performs the content negotiation, parsing the Accept-Encoding header and comparing its requirements with what’s provided by the S3 bucket serving as the origin. It selects the best match and updates the URI to fetch from the origin. The origin response handler sets a Vary: Accept-Encoding header on the response indicating that content was negotiated based on the value of the Accept-Encoding header. The resulting response is then cached in CloudFront.

    Finally, the CloudFront distribution is configured with the “Cache Based on Selected Request Headers” setting set to include Accept-Encoding. This has the effect of CloudFront incorporating the browser’s Accept-Encoding header in its cache key when looking up a response. In addition, this prevents CloudFront from stripping the browser’s Accept-Encoding header before the origin request handler has a chance to execute.

    A bug in CloudFront?

    It is surprising to me that the Vary header is not being added automatically by CloudFront as enabling “Cache Based on Selected Request Headers” and adding Accept-Encoding explicitly indicates that the content for the given URL may vary by the value of this header. This seems like a pretty clear indication that CloudFront should be adding a Vary: Accept-Encoding header to the response automatically.

  2. Simple AWS Request Signing

    Amazon Web Services Amazon Web Services has been online for more than a decade, and now supports a dizzying array of services backed by a relatively easy-to-use REST API. Unfortunately, while the ecosystem of tools and libraries has expanded exponentially, working with these services tends to require a deep scaffolding of dependencies. Lately I've been playing with OpenWRT on my home network and wanted to make some AWS REST API requests. The device has a fairly generous 128MB of storage, but even so, pulling in the official awscli Python package with its dependencies weighs in at over 30MB, not including Python itself.

    Introducing aws4sign, a zero-dependency, MIT-licensed, single-file Python2 library and CLI tool that computes the AWS v4 signature.

    There are two ways to use this.

    Invoke as an executable

    The aws4sign.py file itself contains a simple __main__ that makes it easy to drive signature generation from anywhere that can invoke an executable.

    The following bash snippet calls the AWS Route53 hostedzone API using curl:

    The only thing of note here is that we end up calling aws4sign.py once for each header that we need to pass to curl, selecting the header to display using the -p option. If we omitted this option, all headers would be emitted (one per line), but parsing these is a bit more involved than desirable for such a simple example. Instead, we just emit a single header each time and use cut to grab its value. Note, however, that because the AWS signature algorithm uses a timestamp, we need to ensure that aws4sign.py has a constant notion of time across invocations. We do this by computing the current time up-front and passing it using the -t option.

    Integrated with Python code

    Copy and paste the single 100-line aws4_signature_parts() function into your code. Or integrate it into a module of your own. Whatever. No dependencies. No mucking with PIP. No incompatible licenses.

    The following code invokes the Route53 hostedzone API using urllib2:

    The first two return values from aws4_signature_parts() should probably be ignored by most users -- they are mostly in place to provide visibility into the signing process for validation and testing purposes.

    That's it! Happy signing.

Pages

Categories

Tags