It's no secret that reducing the size of HTTP responses can lead to performance improvements. Surprisingly, this is not a linear relationship; decreasing response size only slightly can dramatically reduce the time required to transfer the data.
This document explains the throughput characteristics of an established TCP connection and how they can shape performance, often in surprising ways.
Note: I making some simplifying assumptions here so that things are easier to model: a pre-existing, idle, TCP connection, and no packet loss. This effectively shows the best case scenario for how TCP can handle a response.
A (brief) refresher on TCP
TCP has several mechanisms that govern how fast the sender can send data.
While a comprehensive understanding of TCP is way, way beyond the scope of this note (and not something the author would claim to possess anyway), the basic flow control mechanisms are not horribly complicated.
First, a bit of vocabulary
- sender -- the party sending data, e.g. an HTTP client when sending a request, or an HTTP server when sending a response; both parties in a TCP connection are senders and receivers
- receiver -- the party receiving data, e.g. an HTTP client when receiving a response, or an HTTP server when receiving a request; both parties in a TCP connection are senders and receivers
- data segment -- a single IP packet containing a TCP header and at least one byte of application data
- congestion window (cwnd) -- the number of un-acknowledged data segments issued by the sender that can be in-flight at once; changes over time as the sender observes congestion on the connection
- initial congestion window (IW or initcwnd) -- the initial value of
cwndfor new connections; 10 is the standard value and what Facebook uses
- maximum segment size (MSS) -- the largest possible size of a single data segment; negotiated during TCP handshaking
- receive window (rwnd) -- the number of bytes that the receiver is willing to buffer for the application
- round-trip time (RTT) -- the amount of time it takes a packet to travel from the sender to the receiver, and back again; colloquially known as "ping time"
The maximum amount of data in-flight from the sender to the receiver is defined
min(MSS * cwnd, rwnd).
Each ACK for a data segment that arrives back at the sender frees up a slot in
cwnd. If the sender is unable to send additional data segments because
there are already
cwnd un-acknowledged segments in-flight, they can send out
new data each time an ACK arrives. In addition, the
cwnd is incremented by 1
each time an ACK is received, effectively doubling the
cwnd value each time a
flight of ACKs arrives for the outstanding data segments.
Data flights and bandwidth
The sender can have up to
cwnd segments in-flight at a given time. Beyond
that, the sender is stuck waiting for ACKs before it can emit additional
segments. For large responses, this means that we typically see a pattern where
cwnd segments are emitted all at once, an RTT passes, and
cwnd ACKs arrive
all at once. At this point, the sender can then send out another
of segments. As a result, output tends to be bursty, with periodicity equal to
Recall that each ACK received increments the
cwnd by 1. For a large response
(i.e. the sender wants to send as much as possible at every opportunity), every
data flight is twice as large as the one before it.
How long does it take to send a response?
If we're able to fit our response into the first data flight, we will require only a single round-trip to receive the response. The inverse is also true: if our response is only a single byte too large, the full response will not be available to the receiver for an additional RTT.
This illustrates an interesting property of TCP's congestion control algorithm:
when investigating latency it's useful to think of transmission size in terms
of the number of data flights that are required to transmit it, rather than
the absolute byte counts. That is, a single-byte response will take just as
much time to receive as an
cwnd * MSS response.
Here is the amount of time required to transmit various data payloads on
typical cell networks around the world. Assumptions: MSS of 1300,
cwnd of 10
(the IETF recommended
IW), and RTTs as shown for various countries.
- 1xRTT (USA 150ms; India 1200ms; Brazil 600ms): 1 byte - 13,000 bytes
- 2xRTT (USA 300ms; India 2400ms; Brazil 1200ms): 13,001 bytes - 39,000 bytes
- 3xRTT (USA 450ms; India 3600ms; Brazil 1800ms): 39,001 bytes - 91,000 bytes
- 4xRTT (USA 600ms; India 4800ms; Brazil 2400ms): 91,001 bytes - 195,000 bytes
RTT values are hypothetical but realistic RTTs for cell network users in the respective countries.
Ok, how can we speed things up?
Using the above table, we can see that if we have a response that tends to be around 40k, the effort to reduce that below the 39k threshold will result in a 50% decrease in time to receive the data! Given that network time often dominates performance, this can be a significant win.
If you are running your own server, you could also increase the
directly, though you really want to be sure you know what you're doing; it's
easy to cause performance problems by introducing congestion into the network
that would have otherwise been avoided. For kicks, here's a link showing the
IW values for major CDN providers.