1. Natural Load Testing

    My friend Paul Reinheimer has put together an excellent product/service that is probably of use to many of you.

    The product is called Natural Load Testing, and it harnesses some of the machinery that powers the also-excellent wonderproxy and its extremely useful VPN service.

    The gist is that once you've been granted an account (they're in private beta right now, but tell them I sent you, and if you're not a horrible person such as a spammer, scammer, or promoter of online timesuck virtual farming, you'll probably get in—just kidding about that farming clause… sort of), you can record real, practical test suites within the simple confines of your browser, and then you can use those recorded actions to generate huge amounts of test traffic to your application.

    In principle, this idea sounds like nothing new—you might already be familiar with Apache Bench, Siege, http_load, or other similar tools—but NLT is fundamentally different from these in several ways.

    First, as I already mentioned, NLT allows you to easily record user actions for later playback. This is cool but on its own is not much more than merely convenient. What isn't immediately obvious is that in addition to the requests you're making (HTTP verbs and URLs), NLT is recording other extremely important information about your actions, too: I find HTTP headers and timing particularly interesting.

    Next, NLT allows you to use the test recordings in a variable manner. That is, you can replace things like usernames and email addresses (and many other bits of variable content) with system-generated semi-random replacements. This allows you to test things like a full signup process, or semi-anonymous comment posting, all under load.

    NLT also keeps track of secondary content that your browser loads when you're recording the test cases. Things like CSS, JavaScript, images, and XHR/Ajax requests are easy to overlook when using less-intelligent tools. NLT records these requests and (optionally) inserts them into test suites along side primary requests.

    Tools like Siege and the others I've mentioned are useful when you want to know how many concurrent requests your infrastructure can sustain. This is valuable data, but it is often not really practical. Handling a Slashdotting (or whatever the modern day equivalent of such things is called) is only part of the problem. Wouldn't you really prefer to know how many users can concurrently sign up for your app, or how many page-1-to-page-2 transitions you can handle, without bringing your servers to their knees (or alternatively: before scaling up and provisioning new machines in your cluster)?

    Here's a practical example. Since before the first edition of the conference, the Brooklyn Beta site had been running on my personal (read: toy) server. Before launching this year's edition of the site, which included the announcement for Summer Camp, I got a bit nervous about the load. I wasn't so much worried about the rest of my server suffering at the traffic of Brooklyn Beta, but more about the Brooklyn Beta site becoming unavailable due to overloading. This seemed like a good opportunity to give NLT a whirl.

    I recorded a really simple test case by firing up NLT's proxy recorder, and visiting each page, in the order and timeframe I expected real users to browse from page to page. Then we unleashed the NLT worker hounds on the pre-release version of the site (same hardware, just not on the main URL), and discovered that it wasn't doing very well under load. I then set up Varnish and put it into the request chain (we were testing mostly dynamically-generayed static content after all—why not cache it?). The results were clear and definitive: Varnish made a huge difference, and NLT showed us exactly how. (We've since moved the Brooklyn Beta site to EC2, along with most of the rest of our infrastructure.)

    This chart shows several response times over 20 seconds with only 100 concurrent requests without Varnish, and most response times less than 20 milliseconds with 500 concurrent requests. Conclusion: we got over a thousand times better performance with five times as many concurrent workers when Varnish was in play.

    (Aside: I hope to blog in more detail about Varnish one day, but in the meantime, if you've got content you can cache, you should cache it. Look up how to do so with Varnish.)

    If NLT sounds interesting, I encourage you to go watch the demo video and sign up. Then send Paul all kinds of bug reports and feature requests so that he can make it more awesome before he accepts the few dollars you'll be begging him to take in exchange for your use of the service.

  2. Ideas of March

    A year ago, I posted about Ideas of March, which Chris got rolling.

    In it, I pledged to blog more.

    Today, I am not so proud to say that I have mostly failed to do so. If I had to come up with a reason, I'd have to say that, personally, 2011 turned out a whole lot different than I was expecting, back then—and not in a good way.

    Over the last year, however, I did post a few things that I think were interesting, and worth of a re-read (at risk of making this post into a clip show):

    PHP Community Conference
    …a post about why I was excited about going to the PHP Community Conference in Nashville, last May. It turned out to be even better than I expected, and I'm really excited that plans are coming together for a 2012 edition.
    Gimme Bar no longer on CouchDB and Gimme Bar on MongoDB
    …a pair of posts describing some problems we had with CouchDB, and our smooth transition to MongoDB. We're still on MongoDB, and for the most part, I still really like it. I'd hinted about these posts in last year's Ideas of March post.
    Webshell
    …on Webshell which I still use almost daily, but has most certainly fallen out of a reasonable upkeep schedule. I really need to find some time to clean out the cobwebs. If you use HTTP and know JavaScript, you should check it out.
    Aficionado's Curse/Pessimistic Optimism
    …a post that I'm particularly proud of; mostly because I've finally managed to document (and coin a term, I hope) for why things seem so bad, but aren't actually so bad.
    HTTP/1.0 and the Connection header
    …finally, over Christmas, I managed to post about HTTP things (-:

    I was really hoping to do more. Last year, I suggested that I might turn my talk on Fifty tips, tricks and tools into a series of small blog posts, and I'd still like to do this. Hopefully in 2012. I also have a list of other things that I'm really interested in writing about. It's just matter of making time to do so. I plan to do that, this year. Starting with this post.

    I'd also like to get around to writing a thing or two about beer, this year…

    Much of what I said last year is still on my mind. I still miss the blogs we kept, 5+ years ago. Let's fix that.

    </navelgazing>

  3. HTTP 1.0 and the Connection header

    I have a long backlog of things to write about. One of those things is Varnish (more on that in a future post). So, over these Christmas holidays, while intentionally taking a break from real work, I decided to finally do some of the research required before I can really write about how Varnish is going to make your web apps much faster.

    To get some actual numbers, I broke out the Apache Benchmarking utility (ab), and decided to let it loose on my site (100 requests, 10 requests concurrently):

    ab -n 100 -c 10 http://seancoates.com/codes

    To my surprise, this didn't finish almost immediately. The command ran for what seemed like forever. Finally, I was presented with its output (excerpted for your reading pleasure):

    Concurrency Level:      10
    Time taken for tests:   152.476 seconds
    Complete requests:      100
    Failed requests:        0
    Write errors:           0
    Total transferred:      592500 bytes
    HTML transferred:       566900 bytes
    Requests per second:    0.66 [#/sec] (mean)
    Time per request:       15247.644 [ms] (mean)
    Time per request:       1524.764 [ms] (mean, across all concurrent requests)
    Transfer rate:          3.79 [Kbytes/sec] received

    Less than one request per second? That surely doesn't seem right. I chose /codes because the content does not depend on any sort of external service or expensive server-side processing (as described in an earlier post). Manually browsing to this same URL also feels much faster than one request per second. There's something fishy going on here.

    I thought that there might be something off with my server configuration, so in order to rule out a concurrency issue, I decided to benchmark a single request:

    ab -n 1 -c 1 http://seancoates.com/codes

    I expected this page to load in less than 200ms. That seems reasonable for a light page that has no external dependencies, and doesn't even hit a database. Instead, I got this:

    Concurrency Level:      1
    Time taken for tests:   15.090 seconds
    Complete requests:      1
    Failed requests:        0
    Write errors:           0
    Total transferred:      5925 bytes
    HTML transferred:       5669 bytes
    Requests per second:    0.07 [#/sec] (mean)
    Time per request:       15089.559 [ms] (mean)
    Time per request:       15089.559 [ms] (mean, across all concurrent requests)
    Transfer rate:          0.38 [Kbytes/sec] received

    Over 15 seconds to render a single page‽ Clearly, this isn't what's actually happening on my site. I can confirm this with a browser, or very objectively with time and curl:

    $ time curl -s http://seancoates.com/codes > /dev/null
    
    real  0m0.122s
    user  0m0.000s
    sys   0m0.010s

    The next step is to figure out what ab is actually doing that's taking an extra ~15 seconds. Let's crank up the verbosity (might as well go all the way to 11).

    $ ab -v 11 -n 1 -c 1 http://seancoates.com/codes
    (snip)
    Benchmarking seancoates.com (be patient)...INFO: POST header == 
    ---
    GET /codes HTTP/1.0
    Host: seancoates.com
    User-Agent: ApacheBench/2.3
    Accept: */*
    
    
    ---
    LOG: header received:
    HTTP/1.1 200 OK
    Date: Mon, 26 Dec 2011 16:27:32 GMT
    Server: Apache/2.2.17 (Ubuntu) DAV/2 SVN/1.6.12 mod_fcgid/2.3.6 mod_ssl/2.2.17 OpenSSL/0.9.8o PHP/5.3.2
    X-Powered-By: PHP/5.3.2
    Vary: Accept-Encoding
    Content-Length: 5669
    Content-Type: text/html
    
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    (HTML snipped from here)
    
    LOG: Response code = 200
    ..done
    
    (snip)

    This all looked just fine. The really strange thing is that the output stalled right after LOG: Response code = 200 and right before ..done. So, something was causing ab to stall after the request was answered (we got a 200, and it's a small number of bytes).

    This is the part where I remembered that I've seen a similar behaviour before. I've lost countless hours of my life (and now one more) to this problem: some clients (such as PHP's streams) don't handle Keep-Alives in the way that one might expect.

    HTTP is hard. Really hard. Way harder than you think. Actually, it's not that hard if you remember that what you think is probably wrong if you're not absolutely sure that you're right.

    ab or httpd does the wrong thing. I'm not sure which one, and I'm not even 100% sure it's wrong (because the behaviour is not defined in the spec as far as I can tell), but since it's Apache Bench, and Apache httpd, we're talking about here, we'd think they could work together. We'd be wrong, though.

    Here's what's happening: ab is sending a HTTP 1.0 request with no Connection header, and httpd is assuming that it wants to keep the connection open, despite this. So, httpd hangs on to the socket for an additional—you guessed it—15 seconds, after the request is answered.

    There are two easy ways to solve this. First, we can tell ab to actually use keep-alives properly with the -k argument. This allows ab to drop the connection on the client side after the request is complete. It doesn't have to wait for the server to close the connection because it expects the server to keep the socket open, awaiting further requests on the same socket; in the previous scenario, the server behaved the same way, but the client waited for the server to close the connection.

    A more reliable way to ensure that the server closes the connection (and to avoid strange keep-alive related benchmarking artifacts) is to explicitly tell the server to close the connection instead of assuming that it should be kept open. This can be easily accomplished by sending a Connection: close header along with the request:

    $ ab -H "Connection: close" -n1 -c1 http://seancoates.com/codes
    (snip)
    Concurrency Level:      1
    Time taken for tests:   0.118 seconds
    Complete requests:      1
    Failed requests:        0
    Write errors:           0
    Total transferred:      5944 bytes
    HTML transferred:       5669 bytes
    Requests per second:    8.48 [#/sec] (mean)
    Time per request:       117.955 [ms] (mean)
    Time per request:       117.955 [ms] (mean, across all concurrent requests)
    Transfer rate:          49.21 [Kbytes/sec] received
    (snip)

    118ms? That's more like it! A longer, more aggressive (and concurrent) benchmark gives me a result of 88.25 requests per second. That's in the ballpark of what I was expecting for this hardware and URL.

    The moral of the story: state the persistent connection behaviour explicitly whenever making HTTP requests.

  4. Webshell

    Webshell is a console-based, JavaScripty web client utility that is great for consuming, debugging and interacting with APIs.

    I use Firefox as my primary browser. The main reason I've been faithful to Mozilla is my set of add-ons. I use Firebug regularly, and I'm not sure what I'd do without JSONovich.

    Last year, as I built Gimme Bar's internal API, I found myself using Curl, extensively, and occasionally Poster, to test and debug my code.

    These two tools have allowed me to interact with HTTP, but not in the most optimal way. Poster's UI is clunky and isn't scriptable (without diving into Firefox extension internals), and Curl requires a lot of Unixy glue to process the results into something more usable than visual inspection.

    I wanted something that would not only make requests, but would let me interact with the result of these requests.

    When working with Evan to debug a problem one day, I mentioned my problem, and said "I really should build something that fixes this." Evan suggested that such a thing would be really useful to him, too, and that he'd be interested in working on it.

    I'd planned on building my version of the tool in PHP. Evan is… not a PHP guy. He's a [whisper]Ruby[/whisper] guy.

    If you've seen me speak at a conference, lately, you've probably seen this graphic:

    Venn Diagram

    It shows that we have diverse roles in Gimme Bar, but everyone who touches our code can speak JavaScript. (This is another, much longer post that I maybe should write, but in the meantime, see this past PHP Advent entry.)

    Thus, Evan suggested that we write Webshell in JavaScript, with node.js as our "framework." Despite the aforementioned affinity for Ruby (cheap shots are fun! (-: ), Evan is a pretty smart guy. It turns out that this was not only convenient, but working with HTTP traffic (especially JSON results (of course)) is way better with JavaScript than it would have been with PHP.

    So, Webshell was born. If you want to see exactly what it does, you should take a look at the readme, which outlines almost all of its functionality.

    If you use curl, or any sort of other ad-hoc queries to inspect, consume, debug or otherwise touch HTTP, I hope you'll take a look at Webshell. It saves me several hours every week, and most of our Gimme Bar administration is done with it. Also, it's on GitHub so please fork and patch. I'd love to see pull requests.

  5. Gimme Bar on MongoDB

    I'm happy to report that Gimme Bar has been running very well on MongoDB since early February of this year. I previously posted on some of the reasons we decided to move off of CouchDB. If you haven't read that, please consider it a prerequisite for the consumption of this post.

    Late last year, I knew that we had no choice but to get off of CouchDB. I was dreading the port. The dread was two-fold. I dreaded learning a new database software, its client interface, administration techniques, and general domain-knowledge, but I also dreaded taking time away from progress on Gimme Bar to do something that I knew would help us in the long term, but was hard to justify from a "product" standpoint.

    I did a lot of reading on MongoDB, and I consulted with Andrei, who'd been using MongoDB with Mapalong since they launched. In the quiet void left by the holiday, on New Year's day this year, I seized the opportunity of absent co-workers, branched our git repository, put fingers-to-keyboard—which I suppose is the coding version of pen-to-paper—and started porting Gimme Bar to Mongo.

    I expected the road to MongoDB to be long, twisty, and paved with uncertainty. Instead, what I found was remarkable—incredible, even.

    Kristina Chodorow has done a near-perfect job of creating the wonderful tandem that makes up PHP's MongoDB extension and its most-excellent documentation. If it wasn't for Kristina (and her employer, 10gen for dedicating her time to this), the porting might have been as-expected: difficult and lengthy. Instead, the experience was pleasant and straightforward. We're not really used to this type of luxury in the PHP world. (-:

    From the start, I knew that our choice of technologies carried a certain amount of risk. I'm kind of a risk-averse person, so I like to weigh the benefits (some of which I outlined in the aforementioned post), and mitigate this risk whenever possible. My mitigation technique involved making my models as dumb as possible about what happens in the code between the models and the database. I wasn't 100% successful in keeping things entirely separate, but the abstraction really paid off. I had to write a lot of code, still, but I didn't have to worry too much about how deep this code had to reach. Other than a few cases, I swapped my CouchDB client code out for an extremely thin wrapper/helper class and re-wrote my queries. The whole process took only around two weeks (of most of my time). Testing, syncing everyone, rebuilding production images and development virtual machine images, and deployment took at least as long.

    That was the story part. Here's comes the opinion part (and remember, this is just my opinion; I could very well be wrong).

    After using both, extensively (for a very specific application, admittedly), I firmly believe that MongoDB is a superior NoSQL datastore solution for PHP based, non-distributed (think Dropbox), non-mobile, web applications.

    This opinion stems almost fully from Mongo's rich query API. In the current version of Gimme Bar, we have a single map/reduce job (for tags). Everything else has been replaced by a straightforward and familiar query. The map/reduce is actually practical, and things like sorting and counting are a breeze with Mongo's cursors. I did have to cheat in a few places that I don't expect to scale very well (I used $in when I should denormalize), but the beauty of this is that I can do these things now, where with Couch, my only option was to denormalize and map. Yes, I know this carries a scaling/sharding and performance penalty, but you know what? I don't care yet. ("Yet" is very important.).

    MongoDB also provides a few other things to developers that were absent in CouchDB. For example, PHP speaks to Mongo through the extension and a native driver. CouchDB uses HTTP for transport. HTTP carries a lot of overhead when you need to do a lot of single-document requests (for example, when topping up a pagination set that's had records de-duplicated). My favourite difference, though, is in the atomic operations, such as findAndModify, which make a huge difference both logic- and performance-wise, at least for Gimme Bar.

    Of course, there are two sides to every coin. There are CouchDB features that I miss. Namely: replication, change notification, CouchDB-Lucene (we're using ElasticSearch and manual indexing now), and Futon.

    Do I think MongoDB is superior to CouchDB? It depends what you're using it for. If you need truly excellent eventual-consistency replication, CouchDB might be a better choice. If you want to have your JavaScript applications talk directly to the datastore, CouchDB is definitely the way to go. Do I have a problem with CouchDB, their developers or their community? Not at all. It's just not a good fit for the kind of app we're building.

    The bottom line is that I'm extremely happy with our port to MongoDB, and I don't have any regrets about switching other than not doing it sooner.