1. Gimme Bar on MongoDB

    I'm happy to report that Gimme Bar has been running very well on MongoDB since early February of this year. I previously posted on some of the reasons we decided to move off of CouchDB. If you haven't read that, please consider it a prerequisite for the consumption of this post.

    Late last year, I knew that we had no choice but to get off of CouchDB. I was dreading the port. The dread was two-fold. I dreaded learning a new database software, its client interface, administration techniques, and general domain-knowledge, but I also dreaded taking time away from progress on Gimme Bar to do something that I knew would help us in the long term, but was hard to justify from a "product" standpoint.

    I did a lot of reading on MongoDB, and I consulted with Andrei, who'd been using MongoDB with Mapalong since they launched. In the quiet void left by the holiday, on New Year's day this year, I seized the opportunity of absent co-workers, branched our git repository, put fingers-to-keyboard—which I suppose is the coding version of pen-to-paper—and started porting Gimme Bar to Mongo.

    I expected the road to MongoDB to be long, twisty, and paved with uncertainty. Instead, what I found was remarkable—incredible, even.

    Kristina Chodorow has done a near-perfect job of creating the wonderful tandem that makes up PHP's MongoDB extension and its most-excellent documentation. If it wasn't for Kristina (and her employer, 10gen for dedicating her time to this), the porting might have been as-expected: difficult and lengthy. Instead, the experience was pleasant and straightforward. We're not really used to this type of luxury in the PHP world. (-:

    From the start, I knew that our choice of technologies carried a certain amount of risk. I'm kind of a risk-averse person, so I like to weigh the benefits (some of which I outlined in the aforementioned post), and mitigate this risk whenever possible. My mitigation technique involved making my models as dumb as possible about what happens in the code between the models and the database. I wasn't 100% successful in keeping things entirely separate, but the abstraction really paid off. I had to write a lot of code, still, but I didn't have to worry too much about how deep this code had to reach. Other than a few cases, I swapped my CouchDB client code out for an extremely thin wrapper/helper class and re-wrote my queries. The whole process took only around two weeks (of most of my time). Testing, syncing everyone, rebuilding production images and development virtual machine images, and deployment took at least as long.

    That was the story part. Here's comes the opinion part (and remember, this is just my opinion; I could very well be wrong).

    After using both, extensively (for a very specific application, admittedly), I firmly believe that MongoDB is a superior NoSQL datastore solution for PHP based, non-distributed (think Dropbox), non-mobile, web applications.

    This opinion stems almost fully from Mongo's rich query API. In the current version of Gimme Bar, we have a single map/reduce job (for tags). Everything else has been replaced by a straightforward and familiar query. The map/reduce is actually practical, and things like sorting and counting are a breeze with Mongo's cursors. I did have to cheat in a few places that I don't expect to scale very well (I used $in when I should denormalize), but the beauty of this is that I can do these things now, where with Couch, my only option was to denormalize and map. Yes, I know this carries a scaling/sharding and performance penalty, but you know what? I don't care yet. ("Yet" is very important.).

    MongoDB also provides a few other things to developers that were absent in CouchDB. For example, PHP speaks to Mongo through the extension and a native driver. CouchDB uses HTTP for transport. HTTP carries a lot of overhead when you need to do a lot of single-document requests (for example, when topping up a pagination set that's had records de-duplicated). My favourite difference, though, is in the atomic operations, such as findAndModify, which make a huge difference both logic- and performance-wise, at least for Gimme Bar.

    Of course, there are two sides to every coin. There are CouchDB features that I miss. Namely: replication, change notification, CouchDB-Lucene (we're using ElasticSearch and manual indexing now), and Futon.

    Do I think MongoDB is superior to CouchDB? It depends what you're using it for. If you need truly excellent eventual-consistency replication, CouchDB might be a better choice. If you want to have your JavaScript applications talk directly to the datastore, CouchDB is definitely the way to go. Do I have a problem with CouchDB, their developers or their community? Not at all. It's just not a good fit for the kind of app we're building.

    The bottom line is that I'm extremely happy with our port to MongoDB, and I don't have any regrets about switching other than not doing it sooner.

    7 Responses

    Feed for this Entry
    • I am in a romance with MongoDb for web applications — finding it to be a perfect fit — myself. But you make one point that didn't occur to me until, well, now: The PHP driver is awesome and superior to what we are used to indeed.

      My biggest take-away from working a year with MongoDb is how natural it is. For web apps it just seems to bloody fit almost every time without the need to think about the layout. 

    • I think the last sentence of your penultimate paragraph sums it up perfectly. No tool will tick every requirement, and most will do certain things better than others.

      Choose the technology that suits your application best, rather than forcing your app to work with your choice of technology.

      We are using CouchDB on a major rewrite of a mySQL backed system partly because the replication feature is extremely important for us. We've had to sometimes work how couch is supposed to be used, but mainly through inexperience using it and coming from a SQL background. 

    • Great read! pragmatism wins

    • Jason Pirkey

      2011 May 06 12:31

      When you mention replication as a feature of CouchDB you miss, MongoDB has replication: http://www.mongodb.org/display/DOCS/Replication including replication sets: http://www.mongodb.org/display/DOCS/Replica+Sets

      Or is there a specific type of replication that CouchDB offers that MongoDB doesn't?

    • Hi Jason.

      Thanks for mentioning this. Yes, Mongo has replication (and the replica sets are really nice).

      With CouchDB, I can replicate to anywhere at any time, incrementally, and very easily. With Mongo, this is possible, but not nearly as simple and reliable as CouchDB (as far as I can tell).

      This is why I think CouchDB is a really good solution for eventually-consistent (read: possibly offline) applications, such as on mobile.

      S

    • Jason Pirkey

      2011 May 06 12:53

      Hey Sean,

      Yeah I figured there was something I was missing in that statement (hence my last question).  Thanks for the reply.

      Cheers

    • So far from using mongoDB, in comparison to using a typical database is that i cannot do an $and operation between the same field, then can pretty much avoid SQL almost completely. The flexibility is un-paramount in terms of how generic you can build your application and to be able to later scale design because you don't have to worry about schema restrictions. SQL has it's place, but this is technology allows you to just start building and get things up quickly.