1. Zappa and LambCI

    In the previous post, we talked about Python serverless architectures on Amazon Web Services with Zappa.

    In addition to the previously-mentioned benefits of being able to concentrate directly on the code of apps we're building, instead of spending effort on running and maintaining servers, we get a few other new tricks. One good example of this is that we can allow our developers to deploy (Zappa calls this update) to shared dev and QA environments, directly, without having to involve anyone from ops (more on this in another post), nor even do we require a build/CI system to push out these types of builds.

    That said, we do use a CI serversystem for this project, but it differs from our traditional setup. In the past, we used Jenkins, but found it a bit too heavy. Our current non-Lambda setup uses Buildbot to do full integration testing (it not only runs our apps' test suites, but it also spins up EC2 nodes, provisions them with Salt, and makes sure they pass the same health checks that our load balancers use to ensure the nodes should receive user requests).

    On this new architecture, we still have a test suite, of course, but there are no nodes to spin up (Lambda handles this for us), no systems to provision (the "nodes" are containers that hold only our app, Amazon's defaults, and Zappa's bootstrap), and not even any load balancers to keep healthy (this is API Gateway's job).

    In short, our tests and builds are simpler now, so we went looking for a simpler system. Plus, we didn't want to have to run one or more servers for CI if we're not even running any (permanent) servers for production.

    So, we found LambCI. It's not a platform we would normally have chosen—we do quite a bit of JavaScript internally, but we don't currently run any other Node.js apps. It turns out that the platform doesn't really matter for this, though.

    LambCI (as you might have guessed from the name) also runs on Lambda. It requires no permanent infrastructure, and it was actually a breeze to set up, thanks to its CloudFormation template. It ties into GitHub (via AWS SNS), and handles core duties like checking out the code, runing the suite only when configured to do so, and storing the build's output in S3. It's a little bit magical—the good kind of magic.

    It's also very generic. It comes with some basic bootstrapping infrastructure, but otherwise relies primarily on configuration that you store in your Git repository. We store our build script there, too, so it's easy to maintain. Here's what our build script (do_ci_build) looks like (I've edited it a bit for this post):

    # more on this in a future post
    # run our test suite with tox and capture its return value
    pip install --user tox && tox
    # if tox fails, we're done
    if [ $tox_ret -ne 0 ]; then
        echo "Tox didn't exit cleanly."
        exit $tox_ret
    echo "Tox exited cleanly."
    set -x
    # this is because lambci considers a PR against master to be the PR branch
    if [[ ! -z "$LAMBCI_CHECKOUT_BRANCH" ]]; then
    # only do the `zappa update` for these branches
    case $BRANCH in
            echo "Not doing zappa update. (branch is $BRANCH)"
            exit $tox_ret
    echo "Attempting zappa update. Stage: $STAGE"
    # we remove these so they don't end up in the deployment zip
    rm -r .tox/ .coverage
    # virtualenv is needed for Zappa
    pip install --user --upgrade virtualenv
    # now build the venv
    virtualenv /tmp/venv
    . /tmp/venv/bin/activate
    # set up our virtual environment from our requirements.txt
    /tmp/venv/bin/pip install --upgrade -r requirements.txt --ignore-installed
    # we use the IAM profile on this lambda container, but the default region is
    # not part of that, so set it explicitly here:
    export AWS_DEFAULT_REGION='us-east-1'
    # do the zappa update; STAGE is set above and zappa is in the active virtualenv
    zappa update $STAGE
    # capture this value (and in this version we immediately return it)
    exit $zappa_ret

    This script, combined with our .lambci.json configuration file (also stored in the repository, as mentioned, and read by LambCI on checkout) is pretty much all we need:

        "cmd": "./do_ci_build",
        "branches": {
            "master": true,
            "qa": true,
            "staging": true,
            "production": true
        "notifications": {
            "sns": {
                "topicArn": "arn:aws:sns:us-east-1:ACCOUNTNUMBER:TOPICNAME"

    With this setup, our test suite runs automatically on the selected branches (and on pull request branches in GitHub), and if that's successful, it conditionally does a zappa update (which builds and deploys the code to existing stages).

    Oh, and one of the best parts: we only pay for builds when they run. We're not paying hourly for a CI server to sit around doing nothing on the weekend, overnight, or when it's otherwise idle.

    There are a few limitations (such as a time limit on lambda functions, which means that the test suite + build must run within that time limit), but frankly, those haven't been a problem yet.

    If you need simple builds/CI, it might be exactly what you need.

  2. Zappa

    For the past few months, I've been focused primarily on a Python project that we're deploying without any servers.

    Well, of course that's not really true, but we're deploying it without any permanent servers.

    The idea of "serverless" architecture isn't brand new, anymore, but running serverless applications on the AWS infrastructure—which I've become very familiar with over the past few years—is still a pretty new concept.

    AWS Lambda has been around for a few years, now. It's a platform that allows you to run arbitary code, in response to events, and pay only for gigabyte-seconds of RAM time. This means that someone else (Amazon) manages the servers, networking, storage, etc.

    At some point, Lambda gained the ability to run Python code (instead of just JavaScript, C#, and Java). This piqued my interest, but we didn't have a whole lot of use for it in building web apps. Nevertheless, we used it to turn SNS notifications into IRC messages, so our #ops IRC channel would get inline notices that our Autoscalers were autoscaling.

    In early 2015, I tweeted: "…too bad @AWSCloud Lambda can’t listen (and respond) to HTTP(S) events on Elastic Load Balancer…".

    A while later, Amazon introduced API Gateway, which—amid other functionality that we don't use very much—had the ability to turn arbitrary HTTP(S) requests into AWS events. Things got interesting. You'll recall, from above, that Lambda functions can run in response to events.

    Interesting in that we could respond to HTTP events, but it wasn't really possible to use regular tools and frameworks with API Gateway. We're used to building apps in Flask, not monolithic Python functions that do their own HTTP request parsing.

    As time went on, these tools got a little more mature and gained more useful features. I kept thinking back to my tweet where we could just run code, not servers.

    Then, in October—increasingly tired of the grind of otherwise-simple operations work—I went searching a bit harder for something to help with the monolithic lambda function problem, and I stumbled upon Zappa. It seemed to be exactly the kind of thing I was looking for. With a bit of boilerplate, hackery, and near-magic, it turns API Gateway request events into WSGI requests, and Flask (plus other Python tools) speaks WSGI. This looked great.

    Little did I know that right around that same time, there were some new, barely-documented (at the time), changes to API Gateway that would help reduce the magic and hacky parts of the Zappa boilerplate.

    I quickly built my first simple Zappa-based app (it was actually porting a 10-year-old PHP app), and deployed it: paste.website.

    We're using this technology on a very large client project, too. It's exciting that we're going to be able to do it without having to worry about things like software upgrades, underutilized servers, and build nodes that cost us money while we're all sleeping.

    I'm not going to let this turn into yet-another-Zappa-tutorial—there are plenty of those out there—but if you're interested in this kind of thing and hadn't heard of Zappa before now, well… now you have.

    We (mostly Rich) even managed to get it working on the brand-new Python 3.6 target in Lambda.

  3. DST pain

    Tonight, in Montreal (and many other North American cities), we change from Standard Time to Daylight Time.

    I know we programmers complain about date/time math relentlessly, but I thought it was worth sharing this real-life problem that someone asked me about on Reddit this weekend:

    It sounds like this is a serious problem that has effected you on more than one occasion. Story?

    The simplest complicated scenario is: let's say we have a call scheduled between our team on the east coast of North America and a colleague in the UK at 10AM Montreal time.

    Normally Nottingham (UK, same as London time) is 5 hours ahead of Montreal. This is pretty easy. Our British colleague needs to join at 3PM.

    However, tonight, we change from EST to EDT in Montreal (clocks move one hour ahead). But the UK will still be on GMT tomorrow. So, now, the daily 10AM call becomes a 2PM call for the Brits.

    But this is only for the next 2 weeks, because BST starts on March 26th (BST is to GMT as EDT is to EST). Then, we go back to a 5 hour difference. So we can expect Europeans to show up an hour late for everything this week. Or maybe we're just an hour early on this side of the Atlantic.

    To make this more difficult, we often have calls between not only Montreal and England, but also those two plus Korea and Brazil.

    Korea doesn't employ Daylight Saving Time, so a standing 7AM call in Seoul (5PM in Montreal) becomes a 6PM call in Montreal.

    And to even further complicate things, our partners in São Paulo switched FROM DST to standard time on Feb 17. Because they're in the southern hemisphere the clock change is the opposite direction of ours, on a different day.

    So: yes. It has affected our team on many occasions. It's already very difficult to get that many international parties synced up. DST can make it nearly impossible.

  4. Vermont

    I get asked, from time to time, what things I would recommend when visiting Vermont. Here's my list. I'll update it as I learn about new gems.

  5. DNS for VMs

    Previously we talked about using Vagrant at Fictive Kin and how we typically have many Virtual Machines (VMs) on the go at once.

    Addressing each of these VMs with a real hostname was proving to be difficult. We couldn’t just use the IP addresses of the machines because they’re unreasonably hard to remember, and other problems like browser cookies don’t work properly.

    In the past, I’ve managed this by editing my local /etc/hosts file (or the Windows equivalent, whatever that’s called now). Turns out this wasn’t ideal. If my hosts don’t sync up with my colleagues’ hosts, stuff (usually cookies) can go wrong, for example. Plus, I strongly believe in setting up an environment that can be managed remotely (when possible) so less-technical members of our team don’t find themselves toiling under the burden of managing an obscurely-formatted text file deep within the parts of their operating systems that they — in all fairness — shouldn’t touch. Oh, and you also can’t do wildcards there.

    As I mentioned in a previous post, we have the great fortune of having all of our VM users conveniently on one operating system platform (Mac OS X), so this post will also focus there, but a similar strategy to this one could be used on Windows or Linux, without the shiny resolver bits — you’d just have to run all of your host’s DNS traffic through a VM-managed name resolver; and these other operating systems might have something similar to resolver that I simply haven’t been enlightened to, and surely someone will point out my error on Twitter or email (please).

    The short version (which I just hinted at) is that we run a DNS server on our gateway VM (all of our users have one of these), and we instruct the workstation’s operating system to resolve certain TLDs via this VM’s IP address.

    We set up the VM side of this with configuration management, in our Salt sates. Our specific implementation is a little too hacky to share (we have a custom Python script that loads hostname configuration from disk, running within systemd), but I’ve recently been tinkering with Dnsmasq, and we might roll that out in the non-distant future.

    Let’s say you want to manage the .sean TLD. Let’s additionally say that you have an app called saxophone (on a VM at and another called trombone (on, and you’d like to address these via URLs like https://saxophone.sean/ and https://trombone.sean/, respectively. Let’s also say that you might want to make sure that http://www.trombone.sean/ redirects to https on trombone.sean (without the www). Finally, let’s say that the saxophone app has many subdomains like blog.saxophone.sean, admin.saxophone.sean, cdn.saxophone.sean, etc. As you can see, we’re now out of one-liner territory in /etc/hosts. (Well, maybe a couple long lines.)

    To configure the DNS-resolving VM (“gateway” for us), with Dnsmasq, the configuration lines would look something like this:


    You can test with:

    $ dig +short @gateway.sean admin.saxophone.sean
    $ dig +short @gateway.sean www.trombone.sean
    $ dig +short @gateway.sean trombone.sean

    Now we’ve got the VM side set up. How do we best instruct the OS to resolve the new (fake) sean TLD “properly”?

    Mac OS X has a mechanism called resolver that allows us to choose specific DNS servers for specific TLDs, which is very convenient.

    Again, the short version of this is that you’d add the following line to /etc/resolver/sean (assuming the gateway is on on your workstation (not the VM):


    Once complete (and mDNSResponder has been reloaded), your computer will use the specified name server to resolve the .sean TLD.

    The longer version is that I don’t want to burden my VM users (especially those who get nervous touching anything in /etc — and with good reason), with this additional bit of configuration, so we manage this in our Vagrantfile, directly. Here’s an excerpt (we use something other than sean, but this has been altered to be consistent with our examples):

    # set up custom resolver
    if !File.exist? '/etc/resolver/sean'
      puts "Need to add the .sean resolver. We'll need sudo for this."
      puts "This should only happen once."
      print "\a"
      puts `sudo sh -c 'if [ ! -d /etc/resolver ]; then mkdir /etc/resolver; fi; echo "nameserver" > /etc/resolver/san; killall -HUP mDNSResponder;'`

    Then, when the day comes that we want to add a new app — call it trumpet — we can do all of it through configuration management from the ops side. We create the new VM in Salt, and the next time the user’s gateway is highstated (that is: the configuration management is applied), the Vagrantfile is altered, and the DNS resolver configuration on the gateway VM is changed. Once the user has done vagrant up trumpet, they should be good to point their browsers at https://trumpet.sean/. We don’t (specifically Vagrant doesn’t) even need sudo on the workstation after the initial setup.

  6. SSH: jump servers, MFA, Salt, and advanced configuration

    Let’s take a short break from our discussion of Vagrant to talk about how we use SSH in production at Fictive Kin.

    Recently, I went on a working vacation to visit my family in New Brunswick (think: east of the eastern time zone in Canada). While there, I needed to log in to a few servers to check on a few processes. I’ve done this in past years, and am frequently away from my sort-of-static home IP address. Usually, this required wrangling of AWS EC2 Security Groups to temporarily allow access from my tethered connection (whose IP changes at least a few times a day), but not this time. This time things were different.

    Over the past year or so, we’ve been reworking most of our production architecture. We’ve moved everything into VPC, reworked tests, made pools work within auto scale groups, and generally made things better. And one of the better things we’ve done is set up SSH to work through a jump host.

    This is certainly not a new idea. I’ve used hosts like this for many years. Even the way we’ve set it up is far from groundbreaking, but I thought it was worth sharing, since I've had people ask me about it, and it’s much more secure.

    The short version is that we’ve set up a SSH “jump” host to allow global SSH access on a non-standard port, and that host — in turn — allows us to access our AWS servers, including QA and production if access has been granted. There is no direct SSH access to any servers except the jump host(s), and they are set up to require multi-factor authentication (“MFA”) with Google’s Authenticator PAM module.

    This is more secure because nearly none of our servers are listening on the public Internet for SSH connections, and our jump host(s) listens on a non-standard port. This helps prevent compromise from non-targetted attacks such as worms, script kiddies, IBR. Additionally the server is configured with a minimal set of services, contains no secrets, requires public keys (no passwords) to log in, has a limited set of accounts, harshly rate-limits failed connections, and has the aforementioned MFA module set up, which we require our jump host users to set up.

    In practice, this is pretty easy to set up and use, both from the server side and for our users.

    From a user’s standpoint, we provision the account, including their public key, through configuration management (we use Salt). They then need to SSH directly to the jump host one time to configure google-authenticator, which asks a few questions, generates a TOTP seed/key, and gives the user a QR code (or seed codes) that they can scan into their MFA app of choice. We have users on the Google Authenticator app (both Android and iOS), as well as 1Password (which we acknowledge is not actually MFA, but it’s still better than single-factor).

    Then, when they want to connect to a server in AWS, they connect via ssh — using their SSH private key — through the jump host (which asks for their current rotating TOTP/MFA code), and if successful allows them to proxy off to their desired server (which also requires their private key, but this is usually transparent to users).

    To illustrate, let’s say a user (sean) wants to connect to their app’s QA server (exampleappqa01.internal.example.net, which is in a VPC that has a CIDR of 10.77/16, or has IP addresses in the 10.77.* range). If they have their SSH configuration file set up properly, they can issue a command that looks like it’s connecting directly:

    ~$ ssh exampleappqa01.internal.example.net
    Authenticated with partial success.
    Verification code: XXXXXX

    This magic is possible through SSH’s ProxyCommand configuration directive. Here’s a sample configuration for internal.example.net:

    # jump host ; used for connecting directly to the jump host
    Host jumphost01.public.example.net
      ForwardAgent yes
      Port 11122  # non-standard port
    # for hosts such as test.internal.example.net, through jumphost01
    Host *.internal.example.net
      ForwardAgent yes
      ProxyCommand nohup ssh -p 11122 %r@jumphost01.public.example.net nc -w1 %h %p
    # internal IP addresses for internal.example.net
    Host 10.77.*
      ForwardAgent yes
      ProxyCommand nohup ssh -p 11122 %r@jumphost01.public.example.net nc -w1 %h %p

    SSH transparently connects (via ssh) to the non-standard port (11122) on jumphost01.public.example.net and invokes nc (netcat — look it up if you’re unfamiliar, and you’re welcome! (-: ) to proxy the connection’s stream over to the actual host (%h) specified on the command line.

    Hope that all made sense. Please hit me up on Twitter (or email) if not.

    Here are a couple bonus scenes for reading this far. Our Salt state for installing Google Authenticator’s PAM module looks like this, on Debian:

        - apt  # for backports
        - sshd-mfa.openssh  # for an updated version of sshd
            # from http://ftp.us.debian.org/debian/pool/main/g/google-authenticator/libpam-google-authenticator_20130529-2_amd64.deb
            - name: libpam-google-authenticator
            - require:
                - pkg: libqrencode3
    # see: http://delyan.me/securing-ssh-with-totp/
    # nullok means that users without a ~/.google_authenticator will be
    # allowed in without MFA; it's opt-in
    # additionally, the user needs to log in to run `google-authenticator`
    # before they'd have a configured MFA app/token anyway
            - pattern: '^@include common-auth$'
            - repl: |
                auth [success=done new_authtok_reqd=done default=die] pam_google_authenticator.so nullok
                @include common-auth # modified
            - require:
                - pkg: libpam-google-authenticator
            - watch_in:
                - service: openssh6.7
            - pattern: 'ChallengeResponseAuthentication no'
            - repl: |
                ChallengeResponseAuthentication yes
                AuthenticationMethods publickey,keyboard-interactive:pam
            - append_if_not_found: True
            - watch_in:
                - service: openssh6.7

    Finally, on this topic: I’ve been playing with assh to help manage my ssh config file, and it’s been working out pretty well. I suggest you give it a look.

  7. Vagrant: Bootstrapping

    In a previous post, we talked about why we use virtual machines, and Vagrant, at Fictive Kin. Now let’s get to how we do it.

    If you’re familiar with Virtual Machine based development environments that are set up through a configuration management (we use Salt, but you could use something different like Ansible, Puppet, Chef, etc.), this probably won’t seem all that new to you, but there are a few things that we do — possibly uniquely — that might help with your systems.

    One thing that I didn’t mention on the previous post is that we’re a bit unlike many other startup kind of shops where developers and other team members (I’m going to just say “developers” to mean all team members from now on, for simplicity) focus on a single, large project. We have large projects, of course, but we tend to work on many of them at once. This doesn’t necessarily mean that each developer will be working on multiple projects at once, but our team — as a whole — will certainly need to be able to pull up access to several projects at the same time.

    We’ve experimented with a monolithic VM (each developer gets one large VM that runs all active projects on the same operating system and virtual instance), but we found that it was both too large (and therefore more complicated, more prone to failure) for most of our developers, and too hard to maintain. Sometimes different projects required different versions of the same RDBMS, for example, and that’s much easier if there’s only one version running. Or, more precisely: one version per virtual machine. Splitting apps onto their own VMs like this also reduces (but certainly far from eliminates) the headaches associated with deploying apps on different platforms — Mined runs our regular Python + Flask + Postgres stack, but Teuxdeux is a Ruby + Sinatra + MySQL app. Technically, these two apps could run on the same VM, but we’ve learned that it’s best to keep them separated.

    So, we give our developers a set of VMs — generally one per project or app. This not only separates concerns for things like database server versions, but also keeps one failing app from trampling on the others, for the most part. Luckily, Vagrant has good support for commanding multiple virtual machines from the same Vagrantfile.

    In addition to one VM per app, each developer has a primary VM called gateway that we use for shared infrastructure, such as DNS resolution (more on this in a later post), caching (we use Apt-Cacher NG to avoid downloading the same Debian packages on each VM), and Vagrantfile management.

    We also use the same “Vagrant Box” (base image file) for each of our VMs, and this image closely matches the base image (EBS-backed AMIs) we use on AWS. (I’ve been tempted to move to an app-specific image model for production, but for now we use nearly the same image everywhere, and we’d continue to do so for the VMs… and since this post is about the VMs, let’s just ignore the future-production parts for now.)

    That was more background information than I intended to share, but I do think it’s important. Let’s get on to a practical example workflow of how we’d get a new developer up and running on their new VMs.

    The first two steps are: install VirtualBox and install Vagrant. We’re lucky enough to have all of our developers’ workstations on the same operating system (Mac OS X), so these steps — and a few other things we do — are relatively simple.

    Next, we have a developer (in their shell) create a new directory, cd into that directory and download a simple “bootstrapping” Vagrantfile, which (essentially) looks like this:

    # -*- mode: ruby -*-
    # vi: set ft=ruby :
    VMNAME=ENV.fetch('VM_NAME', false)
    unless VMNAME
      abort("You must set env[VM_NAME]")
    def bootstrap(vm)
      # common base box
      vm.box_url = "http://example.com/path/to/fk-vm-versionnumber.box"
      vm.box = "fk-vm-versionnumber"
      # remove default minion_id stuff; provision default minion file
      # the base image has a minion id of "UNCONFIGURED"
      vm.provision :shell,
        :inline => 'if [ "`cat /etc/salt/minion_id`" == "UNCONFIGURED" ]; then
        systemctl stop salt-minion
        rm -rf /etc/salt/minion_id /etc/salt/pki/minion;
        cat > /etc/salt/minion<<EOF
        master: saltmaster.example.com
        master_port: 12345
            env: development
        systemctl start salt-minion
    Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
      config.vm.define :gateway, primary: true do |config|
        # bootstrap (common code; salt installer)
        bootstrap config.vm
        # host, network
        config.vm.host_name = "#{VMNAME}.gateway.example.net"
        config.vm.network "private_network", ip: ""
        config.vm.provider "virtualbox" do |v|
          v.customize ["modifyvm", :id, "--memory", 128]

    If you’re already familiar with Vagrant, then most of this could be similar to what you might use, yourself. Possibly with the exception of the VM_NAME bits. You’ll probably also notice that this Vagrantfile only configures one VM, and not the set of VMs (one per app) that’s described above.

    Once our developer has this bootstrapping Vagrantfile, we assign them a “VM Name”, which for the most part is our developer’s first name — mine is sean, so we’ll use that as our example — and have them run the following command:

    VM_NAME=sean vagrant up

    This boots up the developer’s gateway VM for the first time, as sean.gateway.example.net (we have a domain name that we use instead of example.net), and once it’s running, Vagrant executes the inline provisioning script that’s in the Vagrantfile above.

    This provisioning script sets the VM’s Salt minion (the “Salt Minion” is the agent that runs on the “client” machine and connects to a “Salt Master” to get configuration management instructions and data) ID to sean.gateway.example.net, and configures the minion. It then starts the minion, which connects to our public Salt Master (saltmaster.example.com:12345 in our example).

    Once the VM is connected, someone from our ops team uses SSH to connect (through our jumphost — more on this later, too) to the saltmaster and manually verifies the key and approves/signs the sean.gateway.example.net credentials.

    (There’s a small opportunity for someone to spoof the same name that the VM is using and have our administrator mistakenly approve the wrong key (with the same name), but salt-key showing two sets of credentials with the same name (or a rejected set) would be suspicious enough to halt this process… and Salt administration is a topic for another day.)

    After approving the developer’s gateway VM, the administrator proceeds to “highstate” (effectively: apply the defined configuration management to) the VM. This step installs the required software on the gateway VM, such as the aforementioned Apt-Cacher NG.

    Here’s the key to our bootsrapping strategy: one of the bits of managed configuration is a templated /vagrant/Vagrantfile. This means that the Vagrantfile is managed by our configuration management system, and can be updated on the developer’s workstation.

    We (ops) intentionally can’t reach into a directory higher than the one containing the Vagrantfile, but this directory is — by default — mounted at /vagrant on the VMs. Vagrant takes care of managing this mount within our VMs, so each VM in our set has access to /vagrant, which is the same directory that contains the Vagrantfile — pretty convenient!

    Configuration management alters the Vagrantfile to contain not only an updated configuration for the gateway VM, but it also provisions the other VM configurations into the Vagrantfile, so once it’s complete, all a developer needs to do to work on another VM (such as our Mined app) is to vagrant up mined. The developer no longer even needs to set VM_NAME in the environment because we’ve captured that through the first gateway boot, and Salt wrote it directly to the Vagrantfile. Ops doesn’t even need to log into the saltmaster host to approve new keys for this additional VM, and I intend to write about this part, too.

    This has been a relatively long post, but I think you’ll see that managing the Vagrantfile with Salt (or another config management platform) is pretty easy, and it greatly simplifies the burden on our developers (which might not be very skilled in system management).

    In future posts, we’ll talk a bit more about some of the other Vagrantfile customizations that I hinted at, that help our VMs shine.

  8. Vagrant: Why?

    In a previous post, I mentioned that we — at Fictive Kin — over the past several years have managed to build a development environment that works pretty well, and that I’m proud of.

    When (if) everything is working properly, we can get a new team member — even one with little technical knowledge (though indeed a small amount is required) — up and running on a development environment within a few minutes.

    This dev setup mirrors our other environments (qa + staging, production) as closely as possible. Core to my devops philosophy is that you should be working in the same configuration as where you deploy (again, with a few only-if-necessary changes).

    When things go wrong on production, they can be a huge pain to debug. Having a production setup with different paths, different sets of libraries/environments, or even a different operating system to what developers, designers, managers, and QA folks are using is just asking for trouble.

    In the past, I’ve worked on apps that had no “official” development environment. Developers were expected to set up the app on their own, usually without much in the way of instruction or documentation. Developers sometimes like this — we tend to like to do things our own way — but I’ve learned that while it might be convenient for development, it can be disastrous for production. What if the developer installs a very different version of the RDBMS (database) software? What if they’re using / to denote paths when they should be using \? Or if their workstation has a case-insensitive filesystem, but production’s filesystem (correctly) matches case? What if they’ve got the wrong, incompatible version of a library/package installed, or — worse yet — a completely different version of PHP, Python, Node, or Ruby?

    Even if everything goes well and the developer sets up their environment perfectly, there is a time penalty to this. I was once on a client contract at a very high per-hour rate and needed to spend almost two days setting up my (their) environment — which I was still not sure was correct — needlessly costing our client what might have been thousands of dollars, instead of being able to immediately focus on the project at hand.

    The process of getting team members up and running on a functional development environment can be painful. Especially if your team is remote and can’t always easily have someone inspect broken setups… or if members spend a lot of time travelling and are not always blessed with reliable, always-on Internet connections, making cloud-based development setups impractical.

    Our core values for this kind of setup are relatively simple in idea, but not always so in practice. They’ve changed a bit over time, but here are a few that spring to memory:

    • should be quick and easy to set up
    • shouldn’t require much technical knowledge beyond the ability to install some packaged software and navigate some simple commands (cd, mkdir, ls) in Terminal
    • must be able to be managed remotely, when online, so ops can patch security problems and make architectural changes — even if this management is invoked by the user
    • must mirror production as closely as possible
    • can require an Internet connection to get up and running, but then should work offline (such as on an airplane) whenever the app allows
    • must keep the app separated from other apps/development, and must be secure when the hosting workstation joins an untrusted network
    • joins a VPN so other peers/team members can be invited to “take a look at my VM to see what I’m working on”
    • actively prevents unskilled team members from making mistakes that could trickle into production, such as installing incorrect versions of libraries

    There are many other things that our development environments do, but I believe these to be the most important.

    To accomplish this, we use Vagrant, VirtualBox, Debian Linux, Salt, our app stack, and many other parts that we’ll avoid for the purposes of this article. Vagrant and VirtualBox allow us to run “virtual machine” computers within our main workstations.

    On production we also use Debian and Salt plus our app stack and the other bits. Instead of Vagrant + VirtualBox, we deploy in AWS EC2. But as mentioned above: our development stack mirrors production as closely as possible.

    I’m sure this practice matches what some of you already so. Others might use a containerized system (such as Docker). We don’t deploy on Docker, so we also don’t develop on Docker. Maybe one day we will deploy on Docker. At which time, we’ll find a way to make our development environments use/simulate this.

    Still, others of you may develop directly on your workstations. Perhaps a Mac running the stock Apache + PHP, or a Windows box with Python and a dev server listening directly on a HTTP socket. I would discourage this, based on the above mantra of development-matches-production.

    Worse yet, some of you may be developing and deploying applications by editing files on production servers, or uploading individual files, directly. Please don’t do this; it only leads to pain.

    So, we’ve established some core guidelines, and a base set of software. In the next part, we’ll talk about how we (at Fictive Kin) bootstrap our development environment Virtual Machines.

  9. Revitalization Project

    It’s been over three years since I’ve posted anything new to my blog. This saddens me. I miss writing.

    This is my own fault, of course, and there are reasons for my absence…

    Part of it is shifting interests and altered career focus. I’m still working with Fictive Kin, but these days I’m doing almost no PHP, and I spend my days (and sometimes nights) with operations/systems administration. We’re doing really interesting stuff, and that occasionally leads down fun roads. For example, I’ve found time to write this while on my way to Korea to help lead a performance workshop. (I wrote this in May, but am only posting in August. So it goes.)

    Another part of this site’s decay has now hopefully been resolved: a rusty and dusty server that I just couldn’t find the time and motivation to update. I (finally) recently moved this site to a cloud instance in EC2 (Amazon Web Services), off of a five-plus year old dedicated Ubuntu box hosted in downtown Montreal. I no longer need the server to be close, ping-wise, to me, and the lack of flexibility with dedicated hardware was becoming unbearable (as far as finding time to maintain it goes).

    The new hosting setup much more closely matches what we do at work: Vagrant (for development), EC2, Route53, Salt, Python… and I’ve grown an appreciation for reducing cognitive load, so making things over here on the personal side work as closely as possible to things on the professional side is highly beneficial to my ability to remember things and fix problems.

    Python, you say? Yep.

    At work, we’ve moved most of our efforts to Python (Flask-based, but with a built-up library of custom code that helps us build new apps quickly). Despite my membership in its Cabal (developers/leadership), I hadn’t maintained the Habari install on this site for years (and hey… it still wasn’t exploited-in-minutes, Wordpress style, so good for us). I have also fostered an increasing appreciation for simplicity and reliability over the years, and wanted to move to a static (generated) platform. I found Nikola. It met my needs, and was familiar (Jinja, relatively clear Python), so I moved this site off of Habari and Lithium.

    Some stuff isn’t yet ported (namely: my brewing recipes), and some things were simply removed (comments are gone, removed some irrelevant posts, and I didn’t feel the need port over some of my pages), but I did manage to update my shares page… finally.

    There are a few things that we’ve built that I’m partcularly proud of. One of those is our development setup. It’s been an iterative process, and one that was not without failure, but I’m happy to say that after over six years, we’re finally at a place where I consider our development setup to be both reliable and stable. Well, as reliable and stable as software is expected to be, at least.

    The short version is that we use Vagrant, VirtualBox, Salt, and a whole bunch of other pieces to mimic as-close-to-production-as-possible development environments for our users that — when things are working properly — can be set up in a few minutes, can be added to a new project or new app without much technical knowledge on the user side, and can — for the most part — be maintained, debugged, and repaired remotely, without having much control over the host machine (by design). (We’re a fully distributed team, so this last part is critical.)

    I’ll be writing about a few of the tricks/tips/ideas we’ve learned on this journey, here, as well as some other infrastructure that helps with operations. Hopefully I haven’t ignored this site so long that I’ve lost my entire readership. (-:

    I know I said it earlier, but I really do miss writing, and I miss the community of bloggers we once had in web development. We’ve let it become diluted with micro-posts, giving away our content to proprietary services, being perpetually insulted/insulting, slacktivism, word policing, and petty bickering. Is there ever hope of returning to something less pedestrian, less… juvenile? I sure hope so.

  10. Affirmative Wager

    There’s a very risky — but important — conversation that takes place in our community from time to time. It’s about gender and sexism. To be honest, I’m scared to write about this for fear that something I say might be twisted into a derogatory opinion that is not representative of the way I actually think and feel.

    I put this on Twitter, a while back:

    That said, I do have something to say, and I haven’t heard anyone else make this point, so I suppose I should step up and say it.

    When Chris and I select potential writers for Web Advent, we make a conscious decision to approach women who we think would do a good job. I also admit to doing this in the past when my role was to select conference speakers.

    To be clear, I’m not a fan of affirmative action — far from it. Sure, I’m a caucasian male, and I’m not so naïve as to think that there’s not a certain amount of unrequested privilege that comes with being born into this body, but I also strongly believe in the benefits of meritocracy — especially in online communities.

    Naïvety aside, I’ve worked to get where I am today, and I will keep working to advance further. When the opportunity presented itself (due to previous hard work), I moved to Montreal with barely two weeks’ salary in the bank, and decided to work at advancing to the top tier in our field. When I first met Kevin Yank, and saw what he’d accomplished with his first book, I was motivated to get involved in the more-public side of our community: writing, getting involved with PHP documentation, and speaking at conferences. I grew up in a relatively small city, in a timezone that most of you probably don’t even know exists (one hour ahead of America/New_York), where there was little opportunity to survive, let alone advance. I’m even horribly under-educated.

    I mention these things not to glorify my own accomplishments, but to illustrate my strong belief that people should be recognized for their contributions and their abilities, not for their race, gender, financial background, or most other reasons.

    So, I think that people should earn their place, and yet I make a determined effort to seek out female contributors. Sounds like a paradox. I’m not much of a fan of those.

    I have a theory about this. I hope I’m right, but I’m open to the idea that I might not be. My theory goes like this:

    The women who have advanced in our community, and have overcome the hardships that are inherent to being in such a minority, almost certainly function at a higher level than the average community member.

    That is to say that — in my experience, and anecdotally — most of the women who survive in our community are exceptional members of our community. They are very good at what they do, and they are (likely uncoincidentally) some of my favourite people.

    This theory tidily resolves the aforementioned paradox in my logic, and — to me at least — is evidence for why we ought to make an affirmative wager (hat tip to Pascal) in giving women a fair chance (in an often-unfair environment) when making event/opportunity selections, and why more women should be encouraged to participate in the present and future development of how the community operates.

    …at least until the gender imbalance is a thing of the past.