Gimme Bar no longer on CouchDB

2011-May-02

As mentioned in a previous post, we started building Gimme Bar a little over a year ago. We did a lot of things right, but we also did some things wrong.

Since—in those early days—I was the only developer, and since most of my professional development experience is in PHP, that choice was obvious. I also started building the API before the front-end. I chose a really simple routing model for the back-end, and got to work, sans framework. Our back-end code is still really lean (for the most part), and I'm still (mostly (-: ) proud of it.

When it came time to select a datastore, I chose something a bit more risky, with Cameron's blessing.

Having just spent the best part of a year and a half working with PostgreSQL at OmniTI, I felt it was time to try something new. We knew this carried risks, but the timing was good, and—quite frankly—I was simply bored of hacking on stored procedures in PL/pgSQL. We wanted something that could be expected to scale (eventually, when we need it), without deep in-house expertise, but also something that I'd find fun to work on. I love learning new things, so we thought we'd give one of the NoSQL solutions a whirl.

In those days (January 2010), the main NoSQL contenders for building a web application were—at least in our minds—CouchDB and MongoDB. Also in those days, MongoDB didn't quite feel like it was ready for us. I could, of course, be wrong, but I figured that since I didn't know either of these systems very well, the best way to find out was to just pick one and go with it. The thing that ultimately pushed us to the CouchDB camp was a mild professional relationship with some of the CouchDB folks. So, we built the first versions of Gimme Bar on top of Linux, Apache, PHP 5.3, Lithium (on the front-end), jQuery and CouchDB.

By summer 2010, we began work on adding social features (which have since been hidden) to Gimme Bar, and CouchDB started giving us trouble. This wasn't CouchDB's fault, really. It was more of an architectural problem. We were trying to solve a relational problem with a database that by-design knew nothing about how to handle relationships.

Now might be a good time to explain document-independence and map/reduce, but I fear that would take up more attention than you've kindly offered to this article, and it's going to be long even without a detailed tutorial. Here's the short version: CouchDB stores structured objects as (JSON) documents. These documents don't know anything about their peers. To "query" (for lack of a better term) Couch, you need to write a map function (in JavaScript or Erlang, by default) that is passed all documents in the database and emits keys and values to an index that matches your map's criteria. These keys can be (roughly) sorted, and to "query" your documents, you jump to a specific part of this sorted index and grab one or more documents in the sequence. From what I understand of map/reduce (and my only practical experience so far is with CouchDB), this is how other systems such as Hadoop work, too.

There is tremendous value to a system like this. Once the index is generated, it can be incrementally updated, and querying a huge dataset is fast and efficient. The reduce side of map/reduce (we had barely a handful of reduce functions) is also incredibly powerful for calculating aggregates of the map data, but it's also intentionally limited to small subsets of the mapped data. These intentional limits allow map/reduce functions to be highly parallelizable. To run a map on 100 servers, the dataset can be split into 100 pieces, and each server can process its individual chunk safely and in parallel.

This power and flexibility has an architectural cost. Over a decade of professional development with various relational databases taught me that in order to keep one's schema descriptive and robust, one must always (for small values of "always") keep data normalized until a performance problem forces denormalization. With a document-oriented datastore like CouchDB or MongoDB, denormalization is part of the design.

A while ago, I made an extremely stripped-down example of how something like user relationships are handled in Gimme Bar with CouchDB. This document is for the user named "aaron" (_id: c988a29740241c7d20fc7974be05ec54). Aaron is following bob (_id: c988a29740241c7d20fc7974be05f67d), chris (_id: c988a29740241c7d20fc7974be05ff71), and dale (_id: c988a29740241c7d20fc7974be060bb4). You can see the references to the "following" users in aaron's document. I also published example maps of how someone might go about querying this (small) set.

The specific problem that we ran into with CouchDB is that our "timeline" page showed the collected assets of users that the currently-logged-in user is following. So, aaron would see assets that belong to bob, chris and dale. This, in itself, isn't terribly difficult; we just needed to query once for each of aaron's follows. The problem was further complicated when a requirement was raised to not only see the above, but also to collapse duplicates into one displayed asset (if bob and chris collected the same asset, aaron would only see it once). Oh, and also, these assets needed to be sorted by their capture time. These requirements made the chain of documents extremely complicated to query. In a relational system, a few (admittedly expensive) joins would have taken care of it in short order.

I spent a lot of time fighting with CouchDB to solve this problem. I asked in the #couchdb channel on Freenode, posted to the mailing list and even resorted to StackOverflow (a couple times) before coming up with a "solution." I put the word "solution" in quotes there because what I was told to do only partially solved our problem.

The general concensus was that we should denormalize our follow/following + asset records in an extreme way (as you can see in the StackOverflow posts, above). I ended up creating an interim index of all of a user's followers/following links, plus an index of all of the media hashes (what we use to uniquely identify assets, even when captured by different users). Those documents got pretty big pretty quickly (even though we had less than 100 users at the time). Here's an example: Cameron's FollowersIndex document.

As you might guess, even a system designed to handle large documents like this (such as CouchDB) would have a hard time with the sheer size. Every time an asset was captured, it would get injected into the FollowersIndex documents, which caused a reindex… which used up a lot of RAM, and caused bottlenecks. Severe bottlenecks. Our 8GB of RAM was easily exhausted by our JavaScript map function. Think about that. 8GB… for <100 users. This was not going to survive. Turns out we were exhausting Erlang's memory allocator and actually crashing CouchDB. From userspace. I asked around, and the proposed solution to this problem-within-a-problem was to re-write the JavaScript map as Erlang to avoid the JSON conversion overhead. At this point, I was desperate. I had Evan (who is a valuable member of the team, and is a far superior computer scientist to me) translate the JS to Erlang. What he came up with made my head hurt, but it worked. And by "worked," I mean that it didn't crash CouchDB and send it into a recovery spiral (crash, respawn, reindex, crash, repeat)… but it did work. Enough to get us by for a few weeks, and that's what we did: get by. The index regeneration for the friends feed was so slow that I had to use delayed indexes and reindex in cron every minute. CouchDB was using most of our computing resources, and we knew we couldn't sustain any growth with this system.

At this point, we decided to cut our losses, and I went to investigate other options, including MySQL and MongoDB. My next blog post will be on why I think MongoDB is a superior solution for building web applications, despite CouchDB being better in certain areas.

PHP Community Conference

2011-Mar-24

I was once told that "the only reason you're successful is that you were at the right place at the right time." Other than the word "only" in that declaration, the accuser was mostly right. The reason I'm [moderately] successful is that I was at the right place at the right time. The subtlety in the second statement is in the reason I was at the right place at the magical time.

I firmly believe that my technical skills are only part of my value, career-wise. Looking back on my career so far, I can definitely see opportunities that arose because of being at the right place. What wasn't considered in the flippant statement was why I was there, when I was.

To me, it's clear: I've taken measures to put myself in the right place, when it was beneficial to do so. I've been doing this for years, and it's paid off.

Want to know how I became the Editor-in-Chief of php|architect magazine, a Web Architect at OmniTI, and was put into contact with my co-founder for Gimme Bar? Sure, my abilities to build web stuff played into all of those roles, but the way I found myself in all of those positions was by asking. Yes, asking.

Was I in the right place at the right time when I noticed Marco commenting about having to edit the current issue of php|architect, and I chimed in "hey, I kind of actually like that sort of thing," half a decade ago? Definitely, but it's more complicated than "luck."

Similarly, when I approached Chris Shiflett about working with OmniTI, his immediate reaction was "Of course there's room on my team for you; we'll just need to work out the details." Am I that good, when it comes to coding, architecting large deployments, and managing a team? Definitely not—even less so back then.

The real question is why was I hanging out on IRC when Marco was venting, or how was it so easy for me to have Chris's ear? The answer is simple: I'd established myself as part of the PHP community, and had a standing with those guys, even without having ever worked with them, directly (I had written for php|architect before, but it wasn't under Marco's direct supervision).

I assume that many of you readers are already members of the community in some way. That could be as simple as participating on mailing lists or forums, helping reproduce bugs, or fixing grammatical errors in the manual. One of the best ways I've found to connect with the community, though, is in person.

Nearly everyone I know and have had a long-term relationship with, in the PHP community, I met at a conference. Sure, I'd often "known" someone from their online persona, but it's hard to really "know" someone until you've spent some face time with them, preferably with a beer or two between you.

This is one of the main reasons that I think that the PHP Community Conference in Nashville, in just about a month, is important, and why I think you should go. I have no personal stake in this (in fact, since it's run by the community, the only stake to be had is a potential loss by the organizers; there is no profit to be had), I just think it's going to be a great event, and a wonderful opportunity for attendees—and not just from a career perspective, but I expect everyone who attends will become more valuable to their current employers, too, based simply on knowledge gained and connections made. (There's a huge amount of value in being able to fire off a friendly email to the author of (e.g.) the memcached extension, when you get stuck, and to already be on a first-name basis.)

I'm also speaking, there, on Gimme Bar. It won't be a pitch. It will be more of a show-and-tell session on which technologies we use, how we've built what we have so far, what I think we've done right, and a frank discussion on the mistakes we've made (so far (-: ).

If you can, you should make it to the PHP Community Conference, and be in the right place at the right time, whether it's Nashville on April 21 and 22, or sometime in your future.

Ideas of March

2011-Mar-15

Around two weeks ago, Chris wrote a blog post that I responded to, and I was reminded of some of the great conversations that helped build our community. Many of these took place on the blogs of the aughts.

coatesWe, the web community, used to have great conversations on blogs. Here's a chance again: http://j.mp/JSandURLs (Bonus: more than 140 chars.)

Like Chris, I think we've lost a bit of that. I've seen what feels like hundreds of conversations fly by on Twitter, 140 characters at a time: incomplete thoughts crammed into a package that's simply too small for detailed and deep expression. Don't get me wrong—a stream like Twitter (or maybe not Twitter itself) is valuable for quick thoughts and light conversation, but we often need more than that.

Thus, like others, I am pledging to do more blogging this year than last, starting now.

I recently spoke at ConFoo, and I intend to turn my Fifty Things talk into a series of short blog posts. I've also been mulling over a post on how and why we ported Gimme Bar from CouchDB to MongoDB. Those will hopefully pave the way and form a habit and personal culture of blogging. Please feel free to hold me to this intent, and if you have a blog, I hope you'll join this effort of creating a blogging revival (and if you don't yet have a blog, check out Habari).

See you soon.

Gimme Bar: One Year Old

2011-Jan-19

Exactly one year ago, today, Gimme Bar was born.

Gimme Bar has been the focus of my work for that entire time, and I haven't blogged about it (much).

Ironically (sort of), I've been far too busy working on Gimme Bar to do much writing about Gimme Bar, but I thought it fitting to take a couple minutes to write a few words about it today.

The elevator pitch for the project goes something like this: Gimme Bar is a personal utility to help you capture and collect interesting things you find in your day-to-day use of the Web.

I'm admittedly not very good at the pitch, but my colleague (and Gimme Bar's backer) Cameron is, and we released this demo video, yesterday:

We're in the middle of some huge changes on the technical side that I intend to blog about, and once those are released, I hope to add a lot more active users. If the video makes it sound like something you might be interested in, be sure to sign up for an invitation. We've got some really great stuff coming in the pipeline, too, for existing users.

I'll post more, soon, I hope.

Post-Advent 2010

2010-Dec-25

As I write this on Christmas Eve, Chris is putting the finishing touches on PHP Advent 2010.

A brief search of my site indicates that I haven't actually blogged about PHP Advent since 2007, when I was lucky enough to write the first article. That first year, Chris put the advent articles up on his blog (and we do intend to copy them over to phpadvent.org, eventually). Sensing that Chris had entirely too much work to do, curating, and since we were working together by the time the season came around in 2008, I offered to help with editing and curation—I did, after all know the pain/joy of putting together a magazine.

Chris took me up on my offer, and he enlisted Jon and Jon to design and build a proper site. We commissioned authors a little too late, but they came through and PHP Advent 2008 was a success.

By the time 2009 came around, Chris was already deep into preparing to launch Analog, and I'd already announced (internally) that I was moving on to other things. As a result, 2009's Advent was hard. Really hard. We commissioned authors too late, didn't set solid deadlines (as much as we hate deadlines, this sort of date-sensitive project requires them), neglected to dedicate enough time to author herding and editing, and to top it all off, I was headed to Costa Rica for a much-needed vacation, leaving Chris holding the bag for the last five days of 2009's season. Things were so bad at one point, last year, that I took it upon myself to write an article just so that we didn't miss a day. Luckily, we made it through (and by we, I mean Chris, because by the time my flight to San Jose on Dec. 19th came around, I'd had quite enough of Advent for the year).

If we learned anything from PHP Advent 2009, it was sadly not from the great articles, but instead from our own failures. If we were going to do this again in 2010, we needed to get on it early, and we needed to attack with full-force. I set my calendar to start bugging me in August, but even though I was hassled by its weekly reminders, we found ourselves at the start of November, wrecked from just having organized a conference, and in the middle of two product launches. Despite feeling like we didn't want to have the trouble of Advent again in 2010, neither of us dared say it to the other…at least not in so many words.

Due only to the abilities and professionalism of our most excellent authors, PHP Advent 2010 was—at least in my opinion—the best year, yet. They wrote wonderful, substantial, punchy articles that informed our readers, and generated significantly more traffic than we've seen in previous years: over 70,000 views, from more than 25,000 unique visitors, so far, and data from past years tells us that these numbers drop slightly starting on the 25th as we cease to post new content, but remain strong into January, with constant, lower traffic and occasional blips throughout the year. The most popular article this year had more than 10,000 views!

As we post the last article of 2010, I'm encouraged but all of this, and—contrary to how I felt in 2009—am actually looking forward to making PHP Advent even better in 2011.

Thank you Chris, and thank you authors. Have a wonderful new year.

Remote pbcopy

2010-Nov-30

I use the command line a lot. I'm sure many of you do, too.

I find myself often piping things between processes:

$ cat seancoates.com-access_log \
> | awk {'print $1'} \
> | sort \
> | uniq \
> | wc -l
627
$ # unique IPs

One particularly useful tool on my Mac is the pbcopy utility, which takes standard input and puts it on the pasteboard (this is known as the "clipboard" on some other systems). Its sister application, pbpaste is also useful (it outputs your pasteboard to standard output when your pasteboard contains data that can be represented in some sort of text form—if you have image data copied, for example, pbpaste yields no output).

$ cat seancoates.com-access_log \
> | awk {'print $1'} \
> | sort \
> | uniq \
> | pbcopy
$ # the list of unique IPs is now on my pasteboard

I find this particularly useful for getting information from the command line into a GUI application.

Wouldn't it be even more useful if we could pbcopy from a remote SSH session? Indeed it is useful. Here's how.

The first thing you need is a listener on your local machine. Luckily, Apple has provided us with launchd and its administration utility, launchctl. This is basically [x]inetd for your Mac (plus a bunch of other potentially great stuff that I simply don't understand). Put the following in ~/Library/LaunchAgents/pbcopy.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
     <key>Label</key>
     <string>localhost.pbcopy</string>
     <key>ProgramArguments</key>
     <array>
         <string>/usr/bin/pbcopy</string>
     </array>
     <key>inetdCompatibility</key>
     <dict>
          <key>Wait</key>
          <false/>
     </dict>
     <key>Sockets</key>
     <dict>
          <key>Listeners</key>
               <dict>
                    <key>SockServiceName</key>
                    <string>2224</string>
                    <key>SockNodeName</key>
                    <string>127.0.0.1</string>
               </dict>
     </dict>
</dict>
</plist>

…then, run: launchctl load ~/Library/LaunchAgents/pbcopy.plist

This sets up a listener on localhost (127.0.0.1) port 2224, and sends any data received on this socket to /usr/bin/pbcopy. You can try it with telnet:

$ telnet 127.0.0.1 2224
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello
^]
telnet> ^D
Connection closed.

…then try pasting. You should have hello (followed by a newline) on your pasteboard.

The next step is tying this into SSH. Add RemoteForward 2224 127.0.0.1:2224 to ~/.ssh/config. This will tell your SSH connections to automatically forward the remote machine's local port 2224 to your local machine, on the same port, over your encrypted SSH tunnel. It's essentially the same thing as adding -R2224:localhost:2224 to your SSH connection command.

Now you have a listener on your local machine, and a secure tunnel from remote servers to this listener. We need one more piece to tie everything together. Put the following in a file (preferably in your path) on the remote machine(s) where you'd like a pipe-friendly pasteboard:

cat | nc -q1 localhost 2224

…I like to put this in ~/bin/pbcopy or /usr/local/bin/pbcopy on servers where I have root. You'll also need to chmod +x this file to make it executable. You'll need the nc executable, which is often available in a package called netcat. This invocation of nc takes standard input and pushes it to localhost on port 2224.

Now you should have a useful pbcopy on your remote server(s). Be aware, though, that there is no additional security on this port connection. If someone on the remote machine can connect to localhost:2224, they can inject something into your pasteboard. This is usually safe, but you should definitely keep it in mind. Also, if you have multiple users using this technique on the same server, you'll probably want to change the port numbers for each user.

I use this technique all the time. Now you can too. Hope it's helpful.

Brooklyn Beta

2010-Oct-27

Last week, many of the Web's most influential developers and designers converged on a seemingly unremarkable art space (née factory for novelty invisible dog leashes) in Brooklyn for the first of what I hope will become a long-standing conference tradition: Brooklyn Beta.

Despite having personally helped organize several other conferences in the past, Brooklyn Beta has easily earned a spot at the top of my list of favourite events in my career.

I've been involved with planning this event nearly since its inception, but mostly in an advisory role. My friends and colleagues Cameron and Chris did the heavy lifting and deserve all of the credit, though they'd be the first to object to this statement by identifying the many people who came together to volunteer and without whom the conference simply would not have happened.

The goal of BB was to get a group of developers, designers, and (a few) savvy business-type people — the makers of the Web — in one room to meet, converse, show & tell, and hopefully to inspire them to collaborate and make something. Even though only a few days have passed, I know this effort was successful, and I can't wait to see the applications, sites, art and teams that arise and attend next year's conference.

In addition to the impeccable list of speakers, what really made BB stand out was the group of attendees who had the pleasure of spending the day(s) together. Despite my daze (see below), I finally put a face to many of the people whose blogs and Twitter streams have occupied large amounts of my career.

Much of the time leading up to Brooklyn Beta is a blur — we've been frantically trying to finish our app in time to demo (more below) at BB, in addition to handling last-minute details, and we quite obviously bit off more than we could chew. A tip: organize a conference OR finish a large application; don't do both in the same week.

To keep this from turning into rambling and to let me get back to putting some polish on the aforementioned app, here are a few things I feel worth highlighting, in point form:

I am blown away by the overwhelming positivity associated with Brooklyn Beta. I've been following the associated Twitter stream, and with the exception of one misinformed whiner (who didn't even attend BB), I've seen nothing but glowing reviews. Further reading: Fred Wilson (one of our speakers), Josh Smith (Plaid), and Charlie O'Donnell; I also put my photos of Brooklyn Beta on Flickr.
The first talk of the day, by Shelley Bernstein, far exceeded my expectations. It's not that I had low expectations, it's that the talk was absolutely full of wisdom and good practices. If you have the opportunity to see Shelley give this talk, I suggest you take it.
Marco Arment's talk on giving up his day job at Tumblr to focus his efforts on Instapaper was very inspiring. If I wasn't already hip-deep in a startup, I'm pretty sure I wouldn't be able to resist the urge to build something of my own after hearing Marco speak.
Fred Wilson, who — from what I can gather — has been key in funding at least $80M of our peers' projects this year, spoke on Golden Principles for Successful Web Apps. The talk as a whole was very good, but he immediately captured my attention when he opened with a statement that seems obvious to me, but I feel is under-represented in the industry: Speed Matters. This point wasn't buried in the middle of a discussion; it was at the forefront of his talk. Remember this point; Fred obviously knows his stuff.
Gimme Bar! As I hinted above, we're on the cusp of launching a project that I've been working on full time since leaving my day job at the end of 2009. We demoed Gimme Bar at Brooklyn Beta and received universally positive and excited comments. This is extremely encouraging. You will hear more about this before next week.
Similarly, my friends at Analog demoed their project, Mapalong, which was also positively received. I'm excited for them to be launching as well.

If you missed Brooklyn Beta this year, hopefully you won't let it pass you by again in 2011.

There's so much more I could say… but I've got a project to launch. (-:

Arbitrary Incrementer in PHP

2010-Aug-05

On several recent occasions I had a need for an incrementer that uses an arbitrary character set and I thought I'd share my code with you.

I've used this code in the GPL Virus that I wrote to poke fun at the Wordpress/Thesis/GPL debacle, as well as in some clean up I'm doing for the extremely useful JS Bin project.

The most important application, however, was in creating a URL shortening system for the as-yet-unannounced startup project that I'm working on.

I wanted the URL shortener to make the shortest possible URLs. To keep the number of characters in a URL short, I had to increase the set of characters that could comprise a key.

To illustrate this, consider a hexadecimal number versus its decimal equivalent:

$num = 32323232321;
echo $num . "\n";
echo dechex($num) . "\n";

This outputs:

32323232321
7869d6241

As you can see, the second number is two characters shorter than the first number. The reason for this is that every digit of a decimal number is represented by one of 0123456789 (10 unique characters), while ever digit of the hexadecimal number is represented by one of 0123456789abcdef (16 unique characters). This means that we can pack more information into each digit, making the overall length of the key shorter.

PHP has a base_convert() function that allows any sequential base up to 36 (the number of letters in the alphabet (26) plus the 10 numeric digits). We can further compress the above example by increasing the base from 16 (hexadecimal) to 36:

$num = 32323232321;
echo $num . "\n";
echo base_convert($num, 10, 16) . "\n";
echo base_convert($num, 10, 36) . "\n";

Using the full spectrum saves us 4 characters:

32323232321
7869d6241
eukf1oh

Unfortunately, base_convert() does not take the base beyond 36. I wanted to increase the information density (and thus decrease the length of the tokens) even further. URLs are case-sensitive, so why not use both uppercase and lowercase letters? We might as well throw in a few extra characters (- and _).

Additionally, I wanted to be able to increment the sequence, based on the current maximum value. PHP offers no facility as simple as base_convert for this (and the $a = "zzz"; echo ++$a; trick doesn't quite do what I need).

After a bit of code wrangling, I came up with the following algorithm that allows an arbitrary character set, and increments over it, recursively.

function inc($n, $pos=0)
{
    static $set = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_';
    static $setmax = 61;

    if (strlen($n) == 0) {
        // no string
        return $set[0];
    }

    $nindex = strlen($n) - 1 - $pos;
    if ($nindex < 0) {
        // add a new digit to the front of the number
        return $set[0] . $n;
    }

    $char = $n[$nindex];
    $setindex = strpos($set, $char);

    if ($setindex == $setmax) {
        $n[$nindex] = $set[0];
        return inc($n, $pos+1);
    } else {
        $n[$nindex] = $set[$setindex + 1];
        return $n;
    }
}

To change the set, simply alter the $set variable, and adjust the $setmax accordingly. I hope you find this as useful as I have.

After writing this piece, but before publishing it, I stumbled upon some similar code that they use at Flickr to do arbitrary base conversion, so take a peek over there to see how they handle this.

Beer Alchemy Integration

2010-Jul-27

Note from future Sean: this is certainly dead code 10 years later; leaving here for reference in case it's somehow useful.

As I mentioned in my previous post, my beer recipes are now online.

I've had several people ask me how this is done, so I think a post is in order.

While it's entirely possible to brew beer at home without any fancy gadgets, there are several tools I use (such as my refractometer) that make the process easier, more controlled, or both. Brewing software is one of the few instruments that I'm not sure I'd want to brew without. I use a Mac, primarily, so Beer Alchemy (BA) is the obvious choice for recipe formulation, calculation, and logging.

BA has its own HTML export mechanism for recipes, and I used this for quite a long time, but I was never really satisfied with the results. The markup was hard to style, contained a lot of clutter (occasionally useful, but often redundant information such as style parameters), and simply didn't fit well with the rest of my site.

You can also export from BA in PDF (not suitable for web publishing), ProMash's binary recipe format (a pain to convert, although there do seem to be some tools to help with this), BeerXML (normally the most accessible, but in my opinion, a poorly-designed XML format), or in BA's native .bar ("Beer Alchemy Recipe") format, which is what I chose.

The bar format contains a property list, similar to those found throughout Apple systems. Property lists are either binary or XML (but the XML is very difficult to work with using traditional tools because of the way it employs element peering instead of a hierarchy to manage relationships). Luckily, I found a project called CFPropertyList that allows for easy plist handling in PHP. (I even contributed a minor change to this project, a while ago.)

Once you've run the .bar file's contents through CFPropertyList, layout is very simple. Here's most of the code I use to generate my recipes:

<?php
$beerPath = __DIR__ . '/../resources/beer/';

$recipes = apc_fetch('seancoates_recipes');
$fromCache = true;
if ($recipes === false) {
	$fromCache = false;
	foreach (new DirectoryIterator($beerPath) as $f) {
		if ($f->isDot()) {
			continue;
		}
		if (substr($f->getFilename(), -4) != '.bar') {
			continue;
		}
		$cfpl = new CFPropertyList($beerPath . '/' . $f->getFilename());
		$recipe = $cfpl->toArray();
		$title = $recipe['RecipeTitle'];
		$recipes[self::slugify($title)] = array(
			'title' => $title,
			'content' => $recipe,
		);
	}
	asort($recipes);
	if ($recipes) {
		apc_store('seancoates_recipes', $recipes, 3600); // 1h
	}
}

In addition to displaying the recipe's data, I also wanted to show the approximate (calculated) beer colour. Normally, beer recipes declare their colour in "SRM" (Standard Reference Method). There's no obvious, simple, and direct way to get from SRM (which is a number from 0 to 40—and higher, but above the mid 30s is basically "black") to an HTML colour.

I found a few tables online, but I wasn't terribly happy with any of them, and keeping a dictionary for lookups was big and ugly. I like the way Beer Alchemy previews its colours, and since it has HTML output, I emailed the author to see if he'd be willing to share his algorithm. Steve from Kent Place Software graciously sent me an excerpt from his Objective C code, and I translated it to PHP. This might be useful for someone, and since Steve also granted me permission to publish my version of the algorithm, here it is:

<?php
/**
 * Calculate HTML colour from SRM
 * Thanks to Steve from Kent Place Software (Beer Alchemy)
 *
 * @param float $srm the SRM value to turn into HTML
 * @return string HTML colour (without leading #)
 */
public function srm2html($srm)
{
	if ($srm <= 0.1) { // It's water
		$r = 197;
		$g = 232;
		$b = 248;
	} elseif ($srm <= 2) {
		$r = 250;
		$g = 250;
		$b = 60;
	} elseif ($srm <= 12) {
		$r = (250 - (6 * ($srm - 2)));
		$g = (250 - (13.5 * ($srm - 2)));
		$b = (60 - (0.3 * ($srm - 2)));
	} elseif ($srm <= 22) {
		$r = (192 - (12 * ($srm - 12)));
		$g = (114 - (7.5 * ($srm - 12)));
		$b = (57 - (1.8 * ($srm - 12)));
	} else { // $srm > 22
		$r = (70 - (5.6 * ($srm - 22)));
		$g = (40 - (3.1 * ($srm - 22)));
		$b = (40 - (3.2 * ($srm - 22)));
	}
	$r = max($r, 0);
	$g = max($g, 0);
	$b = max($b, 0);
	return sprintf("%02X%02X%02X", $r, $g, $b);
}

A new seancoates.com

2010-Jul-20

Over the past few weeks, my business partner Cameron and I have spent evenings, late nights, and weekends (at least partially) working on a much-improved seancoates.com.

Old seancoates.com

New seancoates.com

If you’re reading this via my feed, or through a syndication outlet, you probably hadn’t noticed.

The primary goal of this change was to reduce (hopefully even remove) the ugliness of my main presence on the Web, and I’m very happy with the results.

In addition to making things look nicer, we also wanted to improve the actual functionality of the site. Formerly, seancoates.com was a blog, with a couple haphazard pages thrown in. The new version serves to highlight my blog (which I fully intend to pick up with more frequency), but also contains a little bit of info about me, a place to highlight my code and speaking/writing contributions, and a good place for me to keep my beer recipes.

Cameron came up with the simple visual design and great interaction design, so a public “Thank You” is in order for his many hours of thought and contribution. Clearly, the ugliness reduction was his doing (due to my poorly-functioning right brain).

I’m very happy with how the site turned out as a whole, and thought I’d outline a few of my favourite bits (that might otherwise be missed at first glance).

URL Sentences

The technique of turning URLs into sentences was pioneered by my friend and colleague Chris Shiflett. Cameron (who shares studio space (and significant amounts of beer) with Chris) and I both like this technique, so we decided to implement it for my site.

The main sections of the site are verbs, so this was pretty easy (once we decided on proper nomenclature). Here are a few examples:

seancoates.com/blogs – Sean Coates blogs…
seancoates.com/blogs/about-php – Sean Coates blogs about PHP (my “PHP” blog tag)
seancoates.com/brews – an index of my published recipes
seancoates.com/brews/coatesmeal-stout – the recipe page for Coatesmeal Stout

To complement the URLs, the page title spells out the page you’re viewing in plain language, and the visual site header indicates where you are (while hopefully enticing you to click through to the other sections).

Moving my blog from the root “directory” on seancoates.com to /blogs caused my URLs to break, so I had to whip up yet another bit of transition code to keep old links functioning. Even links on my original blog (which was hosted on blog.phpdoc.info) should still work. If you find broken links, please let me know.

Vertical Content Integration

My “/is” page contains feeds from Twitter and Flickr.

The Twitter integration was pretty simple; I use the JSON version of my user feed, but I didn’t want to include @replies, so they’ve been filtered out by my code. If the fetch was successful, the filtered data is cached in APC for a short period of time so that I’m not constantly hammering Twitter’s API.

Flickr’s integration was also very simple. After a run-in with some malformed JSON in their API, I decided to integrate through their Serialized PHP Response Format. The resulting data is also cached in APC, but for a longer period of time, as my beer tasting log changes much less frequently.

Code Listings

Displaying code listings on a blog isn’t quite as easy as it sounds. I recently had a discussion with a friend about redesigning his site, and he was considering using Gist from Github’s pastebin-like functionality. Doing so would have given him easy highlighting, but one thing he hadn’t considered was that his blog’s feed would be missing the embedded listings (they come from a third party, and wouldn’t actually appear in his feed’s data stream).

Another problem we faced was one of space. While I often try to keep code to a maximum of 80 (or slightly fewer) characters wide, this isn’t always possible. Injecting a line break into the middle of a line of code is risky, especially for things like SSH keys and URLs. This problem is usually solved by setting the content’s CSS to overflow: scroll, but that littered Cameron’s beautiful design with ugly platform-specific scroll bars. “Clever” designers and developers sometimes overcome this by implementing “prettier” scroll bars, but I’m strongly against this behaviour, so I wouldn’t have it on my site.

I’m quite happy with our eventual solution to this problem. Now, when a blog post contains code that extends beyond the normal width of the blog’s text, the right-most part of the text fades to white, and the listing is clickable. Clicking expands all listings on the page to the minimum width that will accommodate all embedded code.

Here's some example code that stretches much wider than this column would normally allow.
Injecting line breaks is dangerous. Here's why: http://example.com/obviously/not/a/sentence/url
Breaking that in the middle is far from ideal.

jQuery saved me hours of development work here, and I couldn’t recommend it more highly. Highlighting is provided by a plugin that I wrote a couple years ago. It uses GeSHi to highlight many languages. I’ve never been very happy with GeSHi’s output, but it’s Good Enough™ until I can find time to implement a better solution that uses the Tokenizer for PHP.

Software

In addition to PHP, this site integrates a custom version of Habari, with our own theme and plugins. One of those plugins allows me to keep my blog posts in HTML files in my Git repository, to make for much easier editing, greping, etc.

Everything except /blogs was built within the Lithium framework. It handles all of the boring stuff like controllers, routing, and templates, so I didn’t have to write that code myself (which I find incredibly boring these days).

Hashgrid was invaluable in ensuring that the site aligns with a visual grid (again, thanks to Cameron’s meticulous expertise). Pressing your g key will show the grid he used. I even made a few improvements to how Hashgrid works, which I hope to eventually see in the master branch.