1. Use `env`

    We use quite a few technologies to build our products, but Gimme Bar is still primarily a PHP app.

    To support these apps, we have a number of command-line scripts that handle maintenance tasks, cron jobs, data migration jobs, data processing workers, etc.

    These scripts often run PHP in Gimme Bar land, and we make extensive use of the shebang syntax that uses common Unix practice of putting #!/path/to/interpreter at the beginning of our command-line code. Clearly, this is nothing special—lots of people do exactly this same thing with PHP scripts.

    One thing I have noticed, though, is that many developers of PHP scripts are not aware of the common Unix(y) environment helper, env.

    I put this on Twitter a while ago, and it seemed to resonate with a lot of people:

    The beauty of using /usr/bin/env php instead of just /usr/local/bin/php or /usr/bin/php is that env will use your path to find the php you have set up for your user.

    We've mostly standardized our production and development nodes, but there's no guarantee that PHP will be in the same place on each box where we run it. env, however, is always located in /usr/bin—at least on all of the boxes we control, and on my Mac workstation.

    Maybe we're testing a new version of PHP that happens to be in /opt/php/bin/php, or maybe we have to support an old install on a different distribution than our standard, and PHP is located in /bin/php instead of /usr/bin/php. The practice of using env for this helps us push environmental configurations out of our code and into the actual environment.

    If you distribute a PHP application that has command-line scripts and shebang lines, I encourage you to adopt the practice of making your shebang line #!/usr/bin/env php.

    Note that this doesn't just apply to PHP of course, but I've seen a definite lack of env in the PHP world.

  2. PHP as a templating language

    There’s been quite a bit of talk, recently, in PHP-land about templates and the ramifications of enforcing “pure” PHP scripts by preventing scripts from entering HTML mode. I’m not quite sure how I feel about this RFC, but it got me thinking about the whole idea of using PHP for templating in modern web apps.

    For many years, I was a supporter of using PHP as a templating language to render HTML. However, I really don’t buy into the idea of adding an additional abstraction layer on top of PHP, such as Smarty (and many others). In the past year or so, I’ve come to the realization that even PHP itself is no longer ideally suited to function as the templating engine of current web applications — at least not as the primary templating engine for such apps.

    The reason for this evolution is simple: modern web apps are no longer fully server-driven.

    PHP, as you know, is a server technology. Rendering HTML on the server side was fine for many years, but times have changed. Apps are becoming more and more API-driven, and JSON is quickly becoming the de facto standard for API envelopes.

    We can no longer assume that our data will be rendered in a browser, nor that it will be rendered exclusively in HTML. With Gimme Bar, we render HTML server-side (to reduce page load latency), in JavaScript (when adding or changing elements on an already-rendered page), in our API (upcoming in a future enhancement), in our iPhone app, and certainly in other places that I’m forgetting.

    Asset rendering in Gimme Bar can be complicated — especially for embed assets. We definitely don’t want to maintain the render logic in more than one place (at least not for the main app). We regularly need to render elements in both HTML and JavaScript.

    This is precisely why we don’t directly use PHP to render page elements anymore. We use Mustache (and Mustache-compatible Handlebars). This choice allows us to easily maintain one (partial) template for elements, and we can render those elements on the platform of our liking (which has been diversifying more and more lately, but is still primarily PHP and JavaScript).

    Rendering elements to HTML on the server side, even if transferred through a more dynamic method such as via XHR, really limits what can be done on the display side (where “display side” can mean many things these days — not just browsers).

    We try hard to keep the layers our web applications separated through patterns such as Model/View/Controller, but for as long as we’ve been doing so, we’ve often put the view bits in the wrong place. This was appropriate for many years, but now it is time to leave the rendering duties up to the layer of your application that is actually performing the view. This is often your browser.

    For me, this has become the right way to do things: avoid rendering HTML exclusively on the server side, and use a techonology that can push data rendering to your user’s client.

  3. Ideas of March

    A year ago, I posted about Ideas of March, which Chris got rolling.

    In it, I pledged to blog more.

    Today, I am not so proud to say that I have mostly failed to do so. If I had to come up with a reason, I'd have to say that, personally, 2011 turned out a whole lot different than I was expecting, back then—and not in a good way.

    Over the last year, however, I did post a few things that I think were interesting, and worth of a re-read (at risk of making this post into a clip show):

    PHP Community Conference
    …a post about why I was excited about going to the PHP Community Conference in Nashville, last May. It turned out to be even better than I expected, and I'm really excited that plans are coming together for a 2012 edition.
    Gimme Bar no longer on CouchDB and Gimme Bar on MongoDB
    …a pair of posts describing some problems we had with CouchDB, and our smooth transition to MongoDB. We're still on MongoDB, and for the most part, I still really like it. I'd hinted about these posts in last year's Ideas of March post.
    Webshell
    …on Webshell which I still use almost daily, but has most certainly fallen out of a reasonable upkeep schedule. I really need to find some time to clean out the cobwebs. If you use HTTP and know JavaScript, you should check it out.
    Aficionado's Curse/Pessimistic Optimism
    …a post that I'm particularly proud of; mostly because I've finally managed to document (and coin a term, I hope) for why things seem so bad, but aren't actually so bad.
    HTTP/1.0 and the Connection header
    …finally, over Christmas, I managed to post about HTTP things (-:

    I was really hoping to do more. Last year, I suggested that I might turn my talk on Fifty tips, tricks and tools into a series of small blog posts, and I'd still like to do this. Hopefully in 2012. I also have a list of other things that I'm really interested in writing about. It's just matter of making time to do so. I plan to do that, this year. Starting with this post.

    I'd also like to get around to writing a thing or two about beer, this year…

    Much of what I said last year is still on my mind. I still miss the blogs we kept, 5+ years ago. Let's fix that.

    </navelgazing>

  4. HTTP 1.0 and the Connection header

    I have a long backlog of things to write about. One of those things is Varnish (more on that in a future post). So, over these Christmas holidays, while intentionally taking a break from real work, I decided to finally do some of the research required before I can really write about how Varnish is going to make your web apps much faster.

    To get some actual numbers, I broke out the Apache Benchmarking utility (ab), and decided to let it loose on my site (100 requests, 10 requests concurrently):

    ab -n 100 -c 10 http://seancoates.com/codes

    To my surprise, this didn't finish almost immediately. The command ran for what seemed like forever. Finally, I was presented with its output (excerpted for your reading pleasure):

    Concurrency Level:      10
    Time taken for tests:   152.476 seconds
    Complete requests:      100
    Failed requests:        0
    Write errors:           0
    Total transferred:      592500 bytes
    HTML transferred:       566900 bytes
    Requests per second:    0.66 [#/sec] (mean)
    Time per request:       15247.644 [ms] (mean)
    Time per request:       1524.764 [ms] (mean, across all concurrent requests)
    Transfer rate:          3.79 [Kbytes/sec] received

    Less than one request per second? That surely doesn't seem right. I chose /codes because the content does not depend on any sort of external service or expensive server-side processing (as described in an earlier post). Manually browsing to this same URL also feels much faster than one request per second. There's something fishy going on here.

    I thought that there might be something off with my server configuration, so in order to rule out a concurrency issue, I decided to benchmark a single request:

    ab -n 1 -c 1 http://seancoates.com/codes

    I expected this page to load in less than 200ms. That seems reasonable for a light page that has no external dependencies, and doesn't even hit a database. Instead, I got this:

    Concurrency Level:      1
    Time taken for tests:   15.090 seconds
    Complete requests:      1
    Failed requests:        0
    Write errors:           0
    Total transferred:      5925 bytes
    HTML transferred:       5669 bytes
    Requests per second:    0.07 [#/sec] (mean)
    Time per request:       15089.559 [ms] (mean)
    Time per request:       15089.559 [ms] (mean, across all concurrent requests)
    Transfer rate:          0.38 [Kbytes/sec] received

    Over 15 seconds to render a single page‽ Clearly, this isn't what's actually happening on my site. I can confirm this with a browser, or very objectively with time and curl:

    $ time curl -s http://seancoates.com/codes > /dev/null
    
    real  0m0.122s
    user  0m0.000s
    sys   0m0.010s

    The next step is to figure out what ab is actually doing that's taking an extra ~15 seconds. Let's crank up the verbosity (might as well go all the way to 11).

    $ ab -v 11 -n 1 -c 1 http://seancoates.com/codes
    (snip)
    Benchmarking seancoates.com (be patient)...INFO: POST header == 
    ---
    GET /codes HTTP/1.0
    Host: seancoates.com
    User-Agent: ApacheBench/2.3
    Accept: */*
    
    
    ---
    LOG: header received:
    HTTP/1.1 200 OK
    Date: Mon, 26 Dec 2011 16:27:32 GMT
    Server: Apache/2.2.17 (Ubuntu) DAV/2 SVN/1.6.12 mod_fcgid/2.3.6 mod_ssl/2.2.17 OpenSSL/0.9.8o PHP/5.3.2
    X-Powered-By: PHP/5.3.2
    Vary: Accept-Encoding
    Content-Length: 5669
    Content-Type: text/html
    
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    (HTML snipped from here)
    
    LOG: Response code = 200
    ..done
    
    (snip)

    This all looked just fine. The really strange thing is that the output stalled right after LOG: Response code = 200 and right before ..done. So, something was causing ab to stall after the request was answered (we got a 200, and it's a small number of bytes).

    This is the part where I remembered that I've seen a similar behaviour before. I've lost countless hours of my life (and now one more) to this problem: some clients (such as PHP's streams) don't handle Keep-Alives in the way that one might expect.

    HTTP is hard. Really hard. Way harder than you think. Actually, it's not that hard if you remember that what you think is probably wrong if you're not absolutely sure that you're right.

    ab or httpd does the wrong thing. I'm not sure which one, and I'm not even 100% sure it's wrong (because the behaviour is not defined in the spec as far as I can tell), but since it's Apache Bench, and Apache httpd, we're talking about here, we'd think they could work together. We'd be wrong, though.

    Here's what's happening: ab is sending a HTTP 1.0 request with no Connection header, and httpd is assuming that it wants to keep the connection open, despite this. So, httpd hangs on to the socket for an additional—you guessed it—15 seconds, after the request is answered.

    There are two easy ways to solve this. First, we can tell ab to actually use keep-alives properly with the -k argument. This allows ab to drop the connection on the client side after the request is complete. It doesn't have to wait for the server to close the connection because it expects the server to keep the socket open, awaiting further requests on the same socket; in the previous scenario, the server behaved the same way, but the client waited for the server to close the connection.

    A more reliable way to ensure that the server closes the connection (and to avoid strange keep-alive related benchmarking artifacts) is to explicitly tell the server to close the connection instead of assuming that it should be kept open. This can be easily accomplished by sending a Connection: close header along with the request:

    $ ab -H "Connection: close" -n1 -c1 http://seancoates.com/codes
    (snip)
    Concurrency Level:      1
    Time taken for tests:   0.118 seconds
    Complete requests:      1
    Failed requests:        0
    Write errors:           0
    Total transferred:      5944 bytes
    HTML transferred:       5669 bytes
    Requests per second:    8.48 [#/sec] (mean)
    Time per request:       117.955 [ms] (mean)
    Time per request:       117.955 [ms] (mean, across all concurrent requests)
    Transfer rate:          49.21 [Kbytes/sec] received
    (snip)

    118ms? That's more like it! A longer, more aggressive (and concurrent) benchmark gives me a result of 88.25 requests per second. That's in the ballpark of what I was expecting for this hardware and URL.

    The moral of the story: state the persistent connection behaviour explicitly whenever making HTTP requests.

  5. Gimme Bar on MongoDB

    I'm happy to report that Gimme Bar has been running very well on MongoDB since early February of this year. I previously posted on some of the reasons we decided to move off of CouchDB. If you haven't read that, please consider it a prerequisite for the consumption of this post.

    Late last year, I knew that we had no choice but to get off of CouchDB. I was dreading the port. The dread was two-fold. I dreaded learning a new database software, its client interface, administration techniques, and general domain-knowledge, but I also dreaded taking time away from progress on Gimme Bar to do something that I knew would help us in the long term, but was hard to justify from a "product" standpoint.

    I did a lot of reading on MongoDB, and I consulted with Andrei, who'd been using MongoDB with Mapalong since they launched. In the quiet void left by the holiday, on New Year's day this year, I seized the opportunity of absent co-workers, branched our git repository, put fingers-to-keyboard—which I suppose is the coding version of pen-to-paper—and started porting Gimme Bar to Mongo.

    I expected the road to MongoDB to be long, twisty, and paved with uncertainty. Instead, what I found was remarkable—incredible, even.

    Kristina Chodorow has done a near-perfect job of creating the wonderful tandem that makes up PHP's MongoDB extension and its most-excellent documentation. If it wasn't for Kristina (and her employer, 10gen for dedicating her time to this), the porting might have been as-expected: difficult and lengthy. Instead, the experience was pleasant and straightforward. We're not really used to this type of luxury in the PHP world. (-:

    From the start, I knew that our choice of technologies carried a certain amount of risk. I'm kind of a risk-averse person, so I like to weigh the benefits (some of which I outlined in the aforementioned post), and mitigate this risk whenever possible. My mitigation technique involved making my models as dumb as possible about what happens in the code between the models and the database. I wasn't 100% successful in keeping things entirely separate, but the abstraction really paid off. I had to write a lot of code, still, but I didn't have to worry too much about how deep this code had to reach. Other than a few cases, I swapped my CouchDB client code out for an extremely thin wrapper/helper class and re-wrote my queries. The whole process took only around two weeks (of most of my time). Testing, syncing everyone, rebuilding production images and development virtual machine images, and deployment took at least as long.

    That was the story part. Here's comes the opinion part (and remember, this is just my opinion; I could very well be wrong).

    After using both, extensively (for a very specific application, admittedly), I firmly believe that MongoDB is a superior NoSQL datastore solution for PHP based, non-distributed (think Dropbox), non-mobile, web applications.

    This opinion stems almost fully from Mongo's rich query API. In the current version of Gimme Bar, we have a single map/reduce job (for tags). Everything else has been replaced by a straightforward and familiar query. The map/reduce is actually practical, and things like sorting and counting are a breeze with Mongo's cursors. I did have to cheat in a few places that I don't expect to scale very well (I used $in when I should denormalize), but the beauty of this is that I can do these things now, where with Couch, my only option was to denormalize and map. Yes, I know this carries a scaling/sharding and performance penalty, but you know what? I don't care yet. ("Yet" is very important.).

    MongoDB also provides a few other things to developers that were absent in CouchDB. For example, PHP speaks to Mongo through the extension and a native driver. CouchDB uses HTTP for transport. HTTP carries a lot of overhead when you need to do a lot of single-document requests (for example, when topping up a pagination set that's had records de-duplicated). My favourite difference, though, is in the atomic operations, such as findAndModify, which make a huge difference both logic- and performance-wise, at least for Gimme Bar.

    Of course, there are two sides to every coin. There are CouchDB features that I miss. Namely: replication, change notification, CouchDB-Lucene (we're using ElasticSearch and manual indexing now), and Futon.

    Do I think MongoDB is superior to CouchDB? It depends what you're using it for. If you need truly excellent eventual-consistency replication, CouchDB might be a better choice. If you want to have your JavaScript applications talk directly to the datastore, CouchDB is definitely the way to go. Do I have a problem with CouchDB, their developers or their community? Not at all. It's just not a good fit for the kind of app we're building.

    The bottom line is that I'm extremely happy with our port to MongoDB, and I don't have any regrets about switching other than not doing it sooner.