Articles in the AWS category

  1. HTTP content negotiation on AWS CloudFront Part 2

    My earlier post on HTTP content negotiation in AWS CloudFront covered support for negotiating the response encoding using the request’s Accept-Encoding header. This post builds on that by adding support for negotiating Content-Type using the request’s Accept header.

    As Ilya Grigorik laid out several years ago, content negotiation tricky to get right, and yet particularly important when serving images. This is even more relevant today, as recently the browser ecosystem’s support for WebP has taken a big step forward -- Edge just shipped support and work on Firefox support has resumed after a long hiatus. Furthermore, there continue to be interesting new image formats on the horizon such as AVIF and HEIF.

    There are several reasons why Content-Type negotiation is more difficult than Content-Encoding. First, the media types being negotiated are hierarchical, supporting a type and subtype model, e.g. image/png. In addition, wildcards are supported, e.g. image/*. Finally, unlike Accept-Encoding when browsers explicitly send all of the encodings that they support, browsers tend not to do this with the Accept header. For example, Firefox sends Accept: */* when requesting images. This gives the HTTP server no indication of what the browser actually supports -- by the RFC the server would be allowed to return any content at all. As a result, servers are typically either conservative, returning only formats which are highly likely to be supported like image/jpeg, or fall back to heuristics like User-Agent sniffing to detect specific browser builds which support a server-preferred content type.

    What is an HTTP server implementer to do? Apache’s mod_negotiation has a fairly sophisticated set of heuristics for supporting content negotiation which covers some of this including working around usage of overly-permissive wildcards.

    With AWS CloudFront, we can implement something similar to drive this process on Lambda@Edge. The code below is being used to serve this article, and will cause the image of a dog to be returned as WebP if your browser supports it.

    How does this work?

    Similar to my previous post, this uses the zero-dependency, MIT-licensed http-content-negotiation-js library to run the content negotiation process. This library implements all of the requisite media range parsing and semantics, as well as some of the heuristics from mod_negotiation. For example, it treats matches against a subtype wildcard as having an implicit q-value of 0.02 if none of the media ranges in the request have an explicit q-value specified.

    First, the SERVER_IMAGE_TYPES list of ValueTuple objects is created to represent our content type preferences for images. Note that we indicate that we have a slight preference for image/jpeg (q-value 0.99) over image/webp (q-value 0.98). This handles user-agents who do not send an explicit indication of support for one of the image types (e.g. Firefox sends Accept: */*). In this case, we’ll prefer the more conservative image/jpeg. For user-agents that do send specific image types (e.g. Chrome sends Accept: image/webp,image/apng,image/*,*/*;q=0.8), we’ll honor their request for a more specific type.

    Next, the request handler looks for request URLs ending in .jpg and interprets this as a request for an image, performing type negotiation. We then rewrite the URL for the upstream request with a new file extension based on the negotiated content type.

    Finally, it’s worth noting that while type and encoding negotiation are not mutually exclusive, it is generally not worthwhile spending CPU cycles to encode and decode images. Because of this, we only bother performing encoding negotiation if we’re not serving an image.

  2. HTTP content negotiation on AWS CloudFront

    HTTP content negotiation is a mechanism by which web servers consider request headers in addition to the the URL when determining which content to include in the response. A common use cases for this is response body compression, wherein a server may decide to gzip the content if the request arrived with an Accept-Encoding: gzip header.

    Support for content negotiation in HTTP servers is a mixed bag. Apache provides good built-in support for this. NGINX does not offer anything comparable although a rough approximation is possible via configuration directives. Unfortunately I can’t find documentation on IIS. AWS S3 static website hosting, which is used to serve this blog, provides no facility for this whatsoever.

    Over the past few years, CDNs have evolved to help address this problem in a few days.

    Most CDNs can compress content on the fly, even if the origin only serves uncompressed. Support for gzip is de rigueur, with CloudFront supporting Brotli as well. In practice, however, this can be limited. For example, AWS CloudFront won't compress anything under 1KB or over 10MB. In addition, compression is typically more effective the more CPU you spend on it though this effect is non-linear. For example, running gzip at level 9 can produce content that is 10s of percent smaller than level 1, but requires several times the processing power. As a result, CDNs are typically configured to run at fairly low optimization levels.

    Recently CDNs have also begun to allow applications to run business logic at the edge. CloudFlare workers, AWS Lambda@Edge and Fastly VCL are all examples of this.

    Felice Geracitano had the clever idea to use Lambda@Edge on AWS CloudFront to implement a bare bones content negotiation scheme for the purpose of supporting Brotli. While there are some issues with his implementation, the concept of performing content negotiation in JavaScript on the CDN and using the result to drive fetching a different resource from the origin is a powerful one.

    What does a good solution for this look like?

    • The origin server does not need to support content negotiation. This is both cheaper to operate and allows for offline processing of assets (e.g. to compress using gzip level 9 or Zopfli).
    • The content negotiation process should respect quality factors in the HTTP request headers, e.g. Accept-Encoding: gzip, br;q=0.9 indicates that if content encodings for both gzip and br are available, it prefers gzip.
    • The CDN should serve content with a correct Vary header to ensure that downstream caches are not confused by the content negotiation process.
    • The results of JavaScript content negotiation logic should be cached by the CDN, and only re-executed on CDN cache misses.

    Below is an implementation of this for AWS CloudFront, and is being used to handle traffic to https://blog.std.in. The code is MIT licensed and is derived from pgriess/http-content-negotiation-js.

    How does this work?

    The origin for https://blog.std.in/ is an S3 bucket configured for static website hosting. There are 3 different versions of each piece of content -- one un-processed, one compressed with gzip, and another compressed with Brotli. The compressed content lives in a shadow directory hierarchy under /gzip and /br respectively, allowing the path for compressed content to be computed by prepending the requisite directory.

    There are two handlers -- an origin request handler and an origin response handler. There are no viewer handlers, allowing CloudFront to skip this logic entirely when serving a cache hit. The origin request handler performs the content negotiation, parsing the Accept-Encoding header and comparing its requirements with what’s provided by the S3 bucket serving as the origin. It selects the best match and updates the URI to fetch from the origin. The origin response handler sets a Vary: Accept-Encoding header on the response indicating that content was negotiated based on the value of the Accept-Encoding header. The resulting response is then cached in CloudFront.

    Finally, the CloudFront distribution is configured with the “Cache Based on Selected Request Headers” setting set to include Accept-Encoding. This has the effect of CloudFront incorporating the browser’s Accept-Encoding header in its cache key when looking up a response. In addition, this prevents CloudFront from stripping the browser’s Accept-Encoding header before the origin request handler has a chance to execute.

    A bug in CloudFront?

    It is surprising to me that the Vary header is not being added automatically by CloudFront as enabling “Cache Based on Selected Request Headers” and adding Accept-Encoding explicitly indicates that the content for the given URL may vary by the value of this header. This seems like a pretty clear indication that CloudFront should be adding a Vary: Accept-Encoding header to the response automatically.

  3. Simple AWS Request Signing

    Amazon Web Services Amazon Web Services has been online for more than a decade, and now supports a dizzying array of services backed by a relatively easy-to-use REST API. Unfortunately, while the ecosystem of tools and libraries has expanded exponentially, working with these services tends to require a deep scaffolding of dependencies. Lately I've been playing with OpenWRT on my home network and wanted to make some AWS REST API requests. The device has a fairly generous 128MB of storage, but even so, pulling in the official awscli Python package with its dependencies weighs in at over 30MB, not including Python itself.

    Introducing aws4sign, a zero-dependency, MIT-licensed, single-file Python2 library and CLI tool that computes the AWS v4 signature.

    There are two ways to use this.

    Invoke as an executable

    The aws4sign.py file itself contains a simple __main__ that makes it easy to drive signature generation from anywhere that can invoke an executable.

    The following bash snippet calls the AWS Route53 hostedzone API using curl:

    The only thing of note here is that we end up calling aws4sign.py once for each header that we need to pass to curl, selecting the header to display using the -p option. If we omitted this option, all headers would be emitted (one per line), but parsing these is a bit more involved than desirable for such a simple example. Instead, we just emit a single header each time and use cut to grab its value. Note, however, that because the AWS signature algorithm uses a timestamp, we need to ensure that aws4sign.py has a constant notion of time across invocations. We do this by computing the current time up-front and passing it using the -t option.

    Integrated with Python code

    Copy and paste the single 100-line aws4_signature_parts() function into your code. Or integrate it into a module of your own. Whatever. No dependencies. No mucking with PIP. No incompatible licenses.

    The following code invokes the Route53 hostedzone API using urllib2:

    The first two return values from aws4_signature_parts() should probably be ignored by most users -- they are mostly in place to provide visibility into the signing process for validation and testing purposes.

    That's it! Happy signing.

Pages

Categories

Tags