1. HTTP 1.0 and the Connection header

    I have a long backlog of things to write about. One of those things is Varnish (more on that in a future post). So, over these Christmas holidays, while intentionally taking a break from real work, I decided to finally do some of the research required before I can really write about how Varnish is going to make your web apps much faster.

    To get some actual numbers, I broke out the Apache Benchmarking utility (ab), and decided to let it loose on my site (100 requests, 10 requests concurrently):

    ab -n 100 -c 10 http://seancoates.com/codes

    To my surprise, this didn't finish almost immediately. The command ran for what seemed like forever. Finally, I was presented with its output (excerpted for your reading pleasure):

    Concurrency Level:      10
    Time taken for tests:   152.476 seconds
    Complete requests:      100
    Failed requests:        0
    Write errors:           0
    Total transferred:      592500 bytes
    HTML transferred:       566900 bytes
    Requests per second:    0.66 [#/sec] (mean)
    Time per request:       15247.644 [ms] (mean)
    Time per request:       1524.764 [ms] (mean, across all concurrent requests)
    Transfer rate:          3.79 [Kbytes/sec] received

    Less than one request per second? That surely doesn't seem right. I chose /codes because the content does not depend on any sort of external service or expensive server-side processing (as described in an earlier post). Manually browsing to this same URL also feels much faster than one request per second. There's something fishy going on here.

    I thought that there might be something off with my server configuration, so in order to rule out a concurrency issue, I decided to benchmark a single request:

    ab -n 1 -c 1 http://seancoates.com/codes

    I expected this page to load in less than 200ms. That seems reasonable for a light page that has no external dependencies, and doesn't even hit a database. Instead, I got this:

    Concurrency Level:      1
    Time taken for tests:   15.090 seconds
    Complete requests:      1
    Failed requests:        0
    Write errors:           0
    Total transferred:      5925 bytes
    HTML transferred:       5669 bytes
    Requests per second:    0.07 [#/sec] (mean)
    Time per request:       15089.559 [ms] (mean)
    Time per request:       15089.559 [ms] (mean, across all concurrent requests)
    Transfer rate:          0.38 [Kbytes/sec] received

    Over 15 seconds to render a single page‽ Clearly, this isn't what's actually happening on my site. I can confirm this with a browser, or very objectively with time and curl:

    $ time curl -s http://seancoates.com/codes > /dev/null
    real  0m0.122s
    user  0m0.000s
    sys   0m0.010s

    The next step is to figure out what ab is actually doing that's taking an extra ~15 seconds. Let's crank up the verbosity (might as well go all the way to 11).

    $ ab -v 11 -n 1 -c 1 http://seancoates.com/codes
    Benchmarking seancoates.com (be patient)...INFO: POST header == 
    GET /codes HTTP/1.0
    Host: seancoates.com
    User-Agent: ApacheBench/2.3
    Accept: */*
    LOG: header received:
    HTTP/1.1 200 OK
    Date: Mon, 26 Dec 2011 16:27:32 GMT
    Server: Apache/2.2.17 (Ubuntu) DAV/2 SVN/1.6.12 mod_fcgid/2.3.6 mod_ssl/2.2.17 OpenSSL/0.9.8o PHP/5.3.2
    X-Powered-By: PHP/5.3.2
    Vary: Accept-Encoding
    Content-Length: 5669
    Content-Type: text/html
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    (HTML snipped from here)
    LOG: Response code = 200

    This all looked just fine. The really strange thing is that the output stalled right after LOG: Response code = 200 and right before ..done. So, something was causing ab to stall after the request was answered (we got a 200, and it's a small number of bytes).

    This is the part where I remembered that I've seen a similar behaviour before. I've lost countless hours of my life (and now one more) to this problem: some clients (such as PHP's streams) don't handle Keep-Alives in the way that one might expect.

    HTTP is hard. Really hard. Way harder than you think. Actually, it's not that hard if you remember that what you think is probably wrong if you're not absolutely sure that you're right.

    ab or httpd does the wrong thing. I'm not sure which one, and I'm not even 100% sure it's wrong (because the behaviour is not defined in the spec as far as I can tell), but since it's Apache Bench, and Apache httpd, we're talking about here, we'd think they could work together. We'd be wrong, though.

    Here's what's happening: ab is sending a HTTP 1.0 request with no Connection header, and httpd is assuming that it wants to keep the connection open, despite this. So, httpd hangs on to the socket for an additional—you guessed it—15 seconds, after the request is answered.

    There are two easy ways to solve this. First, we can tell ab to actually use keep-alives properly with the -k argument. This allows ab to drop the connection on the client side after the request is complete. It doesn't have to wait for the server to close the connection because it expects the server to keep the socket open, awaiting further requests on the same socket; in the previous scenario, the server behaved the same way, but the client waited for the server to close the connection.

    A more reliable way to ensure that the server closes the connection (and to avoid strange keep-alive related benchmarking artifacts) is to explicitly tell the server to close the connection instead of assuming that it should be kept open. This can be easily accomplished by sending a Connection: close header along with the request:

    $ ab -H "Connection: close" -n1 -c1 http://seancoates.com/codes
    Concurrency Level:      1
    Time taken for tests:   0.118 seconds
    Complete requests:      1
    Failed requests:        0
    Write errors:           0
    Total transferred:      5944 bytes
    HTML transferred:       5669 bytes
    Requests per second:    8.48 [#/sec] (mean)
    Time per request:       117.955 [ms] (mean)
    Time per request:       117.955 [ms] (mean, across all concurrent requests)
    Transfer rate:          49.21 [Kbytes/sec] received

    118ms? That's more like it! A longer, more aggressive (and concurrent) benchmark gives me a result of 88.25 requests per second. That's in the ballpark of what I was expecting for this hardware and URL.

    The moral of the story: state the persistent connection behaviour explicitly whenever making HTTP requests.

    8 Responses

    Feed for this Entry
    • Theoretically, HTTP/1.0 should default to Connection: close and HTTP/1.1 should default to Connection: keep-alive. In practice, you almost never want to use HTTP/1.0 (as it doesn't per-spec support virtual hosts, due to the lack of the Host header), and you should always assume HTTP/1.1 behaviour will be used even if you use it.

    • Andrei Zmievski

      2011 Dec 27 12:58

      This is why you use siege instead of apachebench.

    • Hey Sean, I see you're testing "your site." Is the site based on a particular framework?  I've seen exactly the problem you talk about when benching against Lithium using ab, but other frameworks don't appear to have that problem. (Solution, as noted by Andrei, was to use siege or http_load.)

    • Paul: Yeah, the page I was testing is Lithium-based. You're right, I've noticed this on Lithium more than other places; I wonder what strange thing they're doing.

      Also, I posted the solution (-h "Connection: close"). I don't know what you guys are talking about. (-;

      (I did notice that siege didn't exhibit the same problem when I was testing this. The real end-goal is to test with NLT (https://load.wondernetwork.com/).)


    • I think the end goal is to use tsung – after all it's written in erlang. But in all seriousness, tsung is an awesome tool for load testing – distributed, etc.. We frequently use it to load tests parts of the application.
      Tsung is not as simple as siege or Apache bench though. But by comparison, it's like, a kitchen knife is easier to use than a swiss army knife.
      As for keep-alive – I think there is a misunderstanding: keep-alive does not enable a persistent connection. It enables the client to re-use an existing connectiong when you query for multiple things – for example assets (css, js, img).
      But unless you're serving your app along with assets from the same host, I don't see a good reason for keep-alive period. Again, I'd probably test how it performs for serving lots of assets but your Apache Bench doesn't reflect that either since it's a very simple tool. And in the end that's the reason why most people turn keep-alive off by default.

      (Btw, the first time I tried to post this comment I got an "action is forbidden" error.)

    • It's fine to have fun of apachebench, but technically it is right in this case, isn't it? The fault lies with the server...

      . the "Connection" header is not in the http 1.0 spec

      . any server (and proxy) receiving an http 1.0 request should default to no keepalives

      . I understand that http 1.1 might have conquered the world a while ago but still, if I want to write my own http client supporting 1.0 is almost trivial, while 1.1 is definitely not. And if all servers/proxies are smart enough to handle 1.1 correctly, why should they be not be able to understand 1.0?

    • PS: according to the php devs (Ilia at least), this bug lies server-side. See https://bugs.php.net/bug.php?id=42779. I think the problem lies not within Apache's httpd itself, but possibly in the scripting engine used to deliver the page you were testing...

    • PPS: lo and behold, from the ashes of ab a new script is born: ezab.php (https://github.com/gggeek/ezab).

      - pure php script (easy to hack)
      - aims to support all features of ab
      - already has better support for keepalives and response compression

      Volunteers welcome