November 23 2020
In my earlier posts about HTTP content negotiation (see part 1 and part 2), I built a negotiation engine that runs on the Lambda@Edge AWS proxy, with configuration and policy specified inline in the implementation.
While this does work for my specific needs, it has some drawbacks that prevent it from being deployed un-modified in front of an arbitrary static site.
.jpg
suffix.We can address these issues by moving policy and configuration out of the engine.
Apache's content negotiation implementation supports this by computing a list of possible representations for a given resource and then feeding this into a data-driven algorithm described here. Apache provides two mechanisms of building this list:
Multiviews At request time, Apache applies fixed transformations to the request path (e.g. mapping foo.html
to foo.html.gz
). This requires I/O to discover and parse all existing representations. The proxy must perform a remote directory scan, for which there is no standard HTTP mechanism. Without that, it must guess at some common transformations (e.g. try appending .gz
) and issue a large number of requests upstream to determine which are viable.
Typemaps At request time, Apache loads a typemap file corresponding to the request path. The typemap contains an operator-specified list of representations, URL from which to retrieve them, and associated metadata, including header attributes. This list can be generated at any time, but static sites likely want to do this as part of their build process. The ASCII format is simple enough to generate that it can be done with a trivial shell script. Since this is fetched from the origin over HTTP, these typemaps can be cached with other resources.
Typemaps solve all of the problems called out at the beginning of this post -- the metadata to allow the content engine to negotiate representations is computed outside of the proxy and in a format that is both easy to generate and parse. Because there is no specified mechainsm to generate this file, its contents can reflect whatever policy the origin wishes.
I've updated http-negotiator to support typemaps, and now the Lambda@Edge request handler for this blog has no site-specific policy or configuration.
It now looks like the following
'use strict';
const {
ValueTuple,
awsPerformTypemapNegotiation,
typemapParse } = require('http-negotiator');
const http = require('http');
const { URL } = require('url');
const DEFAULT_DOCUMENT = 'index.html';
const ORIGIN_BASE_URL = 'http://s3.us-west-2.amazonaws.com/blog.std.in';
const SERVER_ENCODING_WHITELIST = new Set()
const SERVER_TYPE_WHITELIST = new Set(['image/gif', 'image/jpeg', 'text/html', 'text/plain']);
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
// Pass through requests to fetch the typemap
if (request.uri.endsWith('.var')) {
callback(null, request);
return;
}
// Generate the URL for the typemap; handle default documents
let varUri = ORIGIN_BASE_URL + request.uri;
if (varUri[varUri.length - 1] == '/') {
varUri = varUri + DEFAULT_DOCUMENT;
}
varUri += '.var';
let bodyBuffers = [];
http.get(varUri, (res) => {
const { statusCode } = res;
// No typemap found; pass though the request to the original object
if (statusCode != 200) {
callback(null, request);
return;
}
res
.on('error', (err) => {
console.error('Error fetching typemap: ' + err.message + '; falling back to original');
callback(null, request);
})
.on('data', (buf) => {
bodyBuffers.push(buf);
})
.on('end', () => {
if (!res.complete) {
console.error('Incomplete typemap; falling back to original')
callback(null, request);
return;
}
// Parse the typemap and perform negotiation
const selectedTm = awsPerformTypemapNegotiation(
request.headers,
typemapParse(Buffer.concat(bodyBuffers).toString()),
new Map([
['accept', SERVER_TYPE_WHITELIST],
['accept-encoding', SERVER_ENCODING_WHITELIST]]));
// XXX: This should return a 406; alas this requires using API gateway ;(
if (selectedTm === null) {
console.error('Negotiation failed; falling back to original')
callback(null, request);
return;
}
// Swizzle the request URL, using the URL constructor to resolve
// relative URLs. The hostname is ignored so we fake it safely.
const u = new URL(selectedTm.uri, 'http://a/' + request.uri);
request.uri = u.pathname + u.search + u.hash;
callback(null, request);
});
});
};