Skip to main content

Gimme Bar on MongoDB

I'm happy to report that Gimme Bar has been running very well on MongoDB since early February of this year. I previously posted on some of the reasons we decided to move off of CouchDB. If you haven't read that, please consider it a prerequisite for the consumption of this post.

Late last year, I knew that we had no choice but to get off of CouchDB. I was dreading the port. The dread was two-fold. I dreaded learning a new database software, its client interface, administration techniques, and general domain-knowledge, but I also dreaded taking time away from progress on Gimme Bar to do something that I knew would help us in the long term, but was hard to justify from a "product" standpoint.

I did a lot of reading on MongoDB, and I consulted with Andrei, who'd been using MongoDB with Mapalong since they launched. In the quiet void left by the holiday, on New Year's day this year, I seized the opportunity of absent co-workers, branched our git repository, put fingers-to-keyboard—which I suppose is the coding version of pen-to-paper—and started porting Gimme Bar to Mongo.

I expected the road to MongoDB to be long, twisty, and paved with uncertainty. Instead, what I found was remarkable—incredible, even.

Kristina Chodorow has done a near-perfect job of creating the wonderful tandem that makes up PHP's MongoDB extension and its most-excellent documentation. If it wasn't for Kristina (and her employer, 10gen for dedicating her time to this), the porting might have been as-expected: difficult and lengthy. Instead, the experience was pleasant and straightforward. We're not really used to this type of luxury in the PHP world. (-:

From the start, I knew that our choice of technologies carried a certain amount of risk. I'm kind of a risk-averse person, so I like to weigh the benefits (some of which I outlined in the aforementioned post), and mitigate this risk whenever possible. My mitigation technique involved making my models as dumb as possible about what happens in the code between the models and the database. I wasn't 100% successful in keeping things entirely separate, but the abstraction really paid off. I had to write a lot of code, still, but I didn't have to worry too much about how deep this code had to reach. Other than a few cases, I swapped my CouchDB client code out for an extremely thin wrapper/helper class and re-wrote my queries. The whole process took only around two weeks (of most of my time). Testing, syncing everyone, rebuilding production images and development virtual machine images, and deployment took at least as long.

That was the story part. Here's comes the opinion part (and remember, this is just my opinion; I could very well be wrong).

After using both, extensively (for a very specific application, admittedly), I firmly believe that MongoDB is a superior NoSQL datastore solution for PHP based, non-distributed (think Dropbox), non-mobile, web applications.

This opinion stems almost fully from Mongo's rich query API. In the current version of Gimme Bar, we have a single map/reduce job (for tags). Everything else has been replaced by a straightforward and familiar query. The map/reduce is actually practical, and things like sorting and counting are a breeze with Mongo's cursors. I did have to cheat in a few places that I don't expect to scale very well (I used $in when I should denormalize), but the beauty of this is that I can do these things now, where with Couch, my only option was to denormalize and map. Yes, I know this carries a scaling/sharding and performance penalty, but you know what? I don't care yet. ("Yet" is very important.).

MongoDB also provides a few other things to developers that were absent in CouchDB. For example, PHP speaks to Mongo through the extension and a native driver. CouchDB uses HTTP for transport. HTTP carries a lot of overhead when you need to do a lot of single-document requests (for example, when topping up a pagination set that's had records de-duplicated). My favourite difference, though, is in the atomic operations, such as findAndModify, which make a huge difference both logic- and performance-wise, at least for Gimme Bar.

Of course, there are two sides to every coin. There are CouchDB features that I miss. Namely: replication, change notification, CouchDB-Lucene (we're using ElasticSearch and manual indexing now), and Futon.

Do I think MongoDB is superior to CouchDB? It depends what you're using it for. If you need truly excellent eventual-consistency replication, CouchDB might be a better choice. If you want to have your JavaScript applications talk directly to the datastore, CouchDB is definitely the way to go. Do I have a problem with CouchDB, their developers or their community? Not at all. It's just not a good fit for the kind of app we're building.

The bottom line is that I'm extremely happy with our port to MongoDB, and I don't have any regrets about switching other than not doing it sooner.