HTTP 1.0 and the Connection header

2011-Dec-27

I have a long backlog of things to write about. One of those things is Varnish (more on that in a future post). So, over these Christmas holidays, while intentionally taking a break from real work, I decided to finally do some of the research required before I can really write about how Varnish is going to make your web apps much faster.

To get some actual numbers, I broke out the Apache Benchmarking utility (ab), and decided to let it loose on my site (100 requests, 10 requests concurrently):

ab -n 100 -c 10 http://seancoates.com/codes

To my surprise, this didn't finish almost immediately. The command ran for what seemed like forever. Finally, I was presented with its output (excerpted for your reading pleasure):

Concurrency Level:      10
Time taken for tests:   152.476 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      592500 bytes
HTML transferred:       566900 bytes
Requests per second:    0.66 [#/sec] (mean)
Time per request:       15247.644 [ms] (mean)
Time per request:       1524.764 [ms] (mean, across all concurrent requests)
Transfer rate:          3.79 [Kbytes/sec] received

Less than one request per second? That surely doesn't seem right. I chose /codes because the content does not depend on any sort of external service or expensive server-side processing (as described in an earlier post). Manually browsing to this same URL also feels much faster than one request per second. There's something fishy going on here.

I thought that there might be something off with my server configuration, so in order to rule out a concurrency issue, I decided to benchmark a single request:

ab -n 1 -c 1 http://seancoates.com/codes

I expected this page to load in less than 200ms. That seems reasonable for a light page that has no external dependencies, and doesn't even hit a database. Instead, I got this:

Concurrency Level:      1
Time taken for tests:   15.090 seconds
Complete requests:      1
Failed requests:        0
Write errors:           0
Total transferred:      5925 bytes
HTML transferred:       5669 bytes
Requests per second:    0.07 [#/sec] (mean)
Time per request:       15089.559 [ms] (mean)
Time per request:       15089.559 [ms] (mean, across all concurrent requests)
Transfer rate:          0.38 [Kbytes/sec] received

Over 15 seconds to render a single page‽ Clearly, this isn't what's actually happening on my site. I can confirm this with a browser, or very objectively with time and curl:

$ time curl -s http://seancoates.com/codes > /dev/null

real  0m0.122s
user  0m0.000s
sys   0m0.010s

The next step is to figure out what ab is actually doing that's taking an extra ~15 seconds. Let's crank up the verbosity (might as well go all the way to 11).

$ ab -v 11 -n 1 -c 1 http://seancoates.com/codes
(snip)
Benchmarking seancoates.com (be patient)...INFO: POST header == 
---
GET /codes HTTP/1.0
Host: seancoates.com
User-Agent: ApacheBench/2.3
Accept: */*


---
LOG: header received:
HTTP/1.1 200 OK
Date: Mon, 26 Dec 2011 16:27:32 GMT
Server: Apache/2.2.17 (Ubuntu) DAV/2 SVN/1.6.12 mod_fcgid/2.3.6 mod_ssl/2.2.17 OpenSSL/0.9.8o PHP/5.3.2
X-Powered-By: PHP/5.3.2
Vary: Accept-Encoding
Content-Length: 5669
Content-Type: text/html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
(HTML snipped from here)

LOG: Response code = 200
..done

(snip)

This all looked just fine. The really strange thing is that the output stalled right after LOG: Response code = 200 and right before ..done. So, something was causing ab to stall after the request was answered (we got a 200, and it's a small number of bytes).

This is the part where I remembered that I've seen a similar behaviour before. I've lost countless hours of my life (and now one more) to this problem: some clients (such as PHP's streams) don't handle Keep-Alives in the way that one might expect.

HTTP is hard. Really hard. Way harder than you think. Actually, it's not that hard if you remember that what you think is probably wrong if you're not absolutely sure that you're right.

ab or httpd does the wrong thing. I'm not sure which one, and I'm not even 100% sure it's wrong (because the behaviour is not defined in the spec as far as I can tell), but since it's Apache Bench, and Apache httpd, we're talking about here, we'd think they could work together. We'd be wrong, though.

Here's what's happening: ab is sending a HTTP 1.0 request with no Connection header, and httpd is assuming that it wants to keep the connection open, despite this. So, httpd hangs on to the socket for an additional—you guessed it—15 seconds, after the request is answered.

There are two easy ways to solve this. First, we can tell ab to actually use keep-alives properly with the -k argument. This allows ab to drop the connection on the client side after the request is complete. It doesn't have to wait for the server to close the connection because it expects the server to keep the socket open, awaiting further requests on the same socket; in the previous scenario, the server behaved the same way, but the client waited for the server to close the connection.

A more reliable way to ensure that the server closes the connection (and to avoid strange keep-alive related benchmarking artifacts) is to explicitly tell the server to close the connection instead of assuming that it should be kept open. This can be easily accomplished by sending a Connection: close header along with the request:

$ ab -H "Connection: close" -n1 -c1 http://seancoates.com/codes
(snip)
Concurrency Level:      1
Time taken for tests:   0.118 seconds
Complete requests:      1
Failed requests:        0
Write errors:           0
Total transferred:      5944 bytes
HTML transferred:       5669 bytes
Requests per second:    8.48 [#/sec] (mean)
Time per request:       117.955 [ms] (mean)
Time per request:       117.955 [ms] (mean, across all concurrent requests)
Transfer rate:          49.21 [Kbytes/sec] received
(snip)

118ms? That's more like it! A longer, more aggressive (and concurrent) benchmark gives me a result of 88.25 requests per second. That's in the ballpark of what I was expecting for this hardware and URL.

The moral of the story: state the persistent connection behaviour explicitly whenever making HTTP requests.