Skip to main content

Exists is the enemy of good

We've all heard the adage "perfect is the enemy of good." I take this to mean: if you only accept perfect, you might miss a good-enough solution. I still try to strive for perfect, but have learned to accept that good might be sometimes enough.

However, I've been saying something to friends and colleagues—for probably a decade now—that is closely related: exists is the enemy of good.

This idea is pretty simple, in principle: sometimes we miss a good-enough solution because a not-quite-good-enough solution is already out there and in use.

When I came up with this, it was specifically about bad software that seems good enough. Usually, stakeholders are less willing to invest in a good solution when one already exists.

Who of us hasn't hacked together a prototype that somehow became the code we released to production and has been doing a not-quite-good-enough job of serving clients since that fateful deadline? In my experience, these things often become absolute nightmares when it comes to maintenance (and security, and performance, and stability…), but they exist, so the urgency to replace them with a good version gets traded for the thousand papercuts of unexpected "well, I didn't think this would happen, so it's going to take me a few more hours than expected" or "oops; we didn't expect this kind of vulnerability…"

Or, we've cut corners: "Yeah, we've been meaning to fix that wildcarded security policy, but it's working for now, so we haven't made time to tighten things down." It exists though, so making it actually good doesn't get priority; it gets kicked down the road—sometimes forever.

This concept applies more broadly, too.

Imagine a community that is served by a single bridge. The bridge was built in 1958 and doesn't have safe walkways, or even a reasonable path for bicycles. Despite this, pedestrians and cyclists attempt to use the bridge daily—dangerously. The bridge exists, though. Replacing it with a better bridge is something the local council talks about every year or two, but no one is willing to risk their position by trying to justify the expense. Even worse: replacing the bridge would leave the community without a convenient link to the next town (they'd have to go the long way around) for a period of time while the replacement is being deployed, so even if someone were to get it into the budget, the citizens wouldn't stand for the downtime. Sure, everyone wants a better bridge—a good bridge—but the bridge that exists is the enemy of this. Now imagine how quickly (relatively, of course—this is local government after all) action would be taken if the bridge were to become unusable. A flash flood knocks out the bridge, and there's an opportunity—no, a downright necessity—to prop up a good bridge, because the old one practically no longer exists.

My home automation has been a bit neglected for the past year or so. Fundamentally, I'm still firmly in the position I mentioned (my lightbulbs can't join a botnet to DDoS you if they physically can't connect to the Internet (without a gateway)), but I'd like to tidy up a few things; recategorize, and maybe even explore a different system. But it exists and it mostly works, so it's hard to justify the time and hassle of naming/grouping rooms, or switching control platforms to something that is potentially more good.

When I've complained about Electron apps on my desktop, saying things like "This would be so much better if it used the native toolkit. It's super annoying that the tab key doesn't work right and this scrollbar is janky. It destroys my battery life," I've sometimes been met with the response "well, the Electron app is better than no app right?" Is it, though? If this bad app experience didn't exist, there might be some reason for them to build a good app. The motivation is squelched by "what we have is good enough" when usually it isn't good enough for me. My oven has what I would consider a terrible app. I have little hope that Anova will replace it with something good, though, because they have an app that exists.

Now, I'm not saying we need to greenfield everything, and we should usually avoid full rewrites—or at least approach them with a healthy dose of caution—but I've been thinking about this exists situation a lot and it's a real mess.

We need to be very careful about justifying bad experiences with "perfect is the enemy of good" when we should be striving harder for good itself. The popularity of this expression is responsible for a lot of garbage. Sometimes we just shouldn't do the quick and easy thing—even if we've tricked ourselves into thinking it's temporary. Exists is also the enemy of good.

A Secret Career Code

or: How, 25 years ago, I went viral and met King (Prince) Charles

Someone has been paying me to work on technology stuff for over 20 years, now. And there’s one weird trick that I’ve learned that I think it’s worth passing on.

This isn’t really a secret code (and the one weird trick bit is a joke, of course), but it was so non-obvious to me in my early career, and it’s so valuable to me in my mid career that I wanted to share it—and also tell the story about going viral before anyone would have ever called it going viral.

The revelation is at the very end of this post, so feel free to skip ahead if you don’t want to hear the stories of how I learned it, but I think they’re fun.

Also, this turned out longer than I expected…

1998

In 1998, I graduated from high school. That last year, especially, I spent a good deal of my time really refining my would-be professional interests. I’d been programming for years, but multimedia was the hot thing, and I had a genuine interest in audio and video.

Our school had a really interesting (and maybe unique) setup where the 3 anglophone high schools in my city all shared the technical programs space, which we school-bussed to and from between classes, as needed.

This meant that instead of each of the high schools having to stock their own specialties labs, we shared resources with the other 2 main schools, at a formerly-4th school. This Science and Technology Centre campus held well-equipped labs for all kinds of chronically-underfunded subjects:

  • electronics (I saw my first Tesla coil there)
  • home economics (like cooking, well-equipped kitchens)
  • automotive tech (a fully equipped garage with a lift where students could learn to work on cars and even fix their own)
  • control technologies (electropneumatics, actuators, PLCs, etc.)
  • traditional machine shop and CAD/CAM (with actual mini manufacturing/machining spaces)
  • wood shop (carpentry, but also a full shop with planers, jointers, lathes, cabinetry facilities, etc.)
  • a computer programming lab that was set up with actual compilers, not just typing and Microsoft Office classes
  • its own well-equipped theatre for drama, live music, and video presentations
  • likely many other spaces that I didn’t participate in or really even notice
  • and where I spent most of my time: the media lab

The 4th school has been turned back into a normal high school for years, now, so students there no longer have the same shared resources opportunities that we were so privileged—yet unthankful for the most part since we were kids—to participate in. It was called the MacNaughton Science and Technology Centre, for future web archaeologists searching for references (see my own archaeology woes below).

The media lab was a suite that contained a main classroom that was set up—in addition to regular classroom stuff—for viewing videos and teaching media literacy (we learned about Manufacturing Consent, and talked about 1984), but it also contained a small-but-equipped recording studio (I spent hundreds of hours in there recording friends, learning to mix, mic, bounce tracks, EQ…), and a video production suite that had an expensive-at-the-time near-broadcast quality camera (a Canon XL-1), and a few workstations for editing video, photos, digital painting, CD ROM production (hey, this was big at the time), and all of the related things.

Aside from the unbelievable opportunity that we had with all of this equipment, as high school students, the teacher who ran the lab—the only teacher we called by a first name, by his request—Andrew Campbell was one of those special teachers you look back on and recognize how much time and effort—how much care—they put into us. Weekends and evenings, he was often available to help us get set up, or at least unlock the doors (and secretly share his keys if he couldn’t be there for a Saturday recording session or to kick off a multi-hour video compile). I’m forever grateful for being a part of that program with Andrew back then.

Anyway, because of this experience and the time I was able to spend in the lab, I got pretty good at stringing videos together and producing them into something that was worth watching. There were a few of us from each of the 3 schools that had above-average abilities to run these System 7 SCSI-based Macs with Adobe Premiere.

In Canada, in the 90s—at least where I lived—it seemed like everyone (unless your family was exceptionally religious or you lived on a rural farm or—I guess—just didn’t have cable or access to a friend’s basement that had cable) watched MuchMusic. This was roughly the same as MTV in the ’States. Many of us just kind of ambiently turned it on if we weren’t actually watching something else on the TV—I fell asleep to the overnight VJs, many nights.

One recurring public service campaign, every year on MuchMusic, which was partly funded by the Canadian Government, was “Stop Racism!” which promoted March 21: the international day for the elimination of racial discrimination. If you grew up in Canada in the ’90s, you might remember the hand logo from this campaign.

Racism: Stop It hand logo

Each year, as part of this public service, they ran a video competition where they called on students from all over the country to make a short (90 second) video that would enter into a competition where a panel of judges would choose the best high-schooler-produced videos, and these videos would be cut down and aired every few hours on MuchMusic over the course of a month or so. The winners would also be honoured in a ceremony with musical guests and dignitaries.

A few days before the deadline to submit videos for this, a couple of my not-even-very-close friends from school asked me if we could put something together. I said sure. We made up a script (in hindsight, it was probably very cringey, with eggs spray-painted different colours, and the message was that they were all the same inside). We shot and edited the video, and submitted it barely on time (I think we had to postal-mail a VHS tape). We certainly did not expect to win.

As you may have guessed if you’re still reading: we did win. There were several groups of winners, but we were the only winners from our region. They flew us to Vancouver (this was a really big deal to me; I’d only ever been on one plane trip before at this point) to participate in an awards ceremony, hosted by Prince Charles, and several musical guests that we cared much more about, and we got to meet them all at the reception afterwards. I honestly don’t remember what we talked about, but we definitely had an audience with the not-yet-king. (I do remember chatting with Great Big Sea, Matthew Good, and Chantal Kreviazuk, though.)

Our video aired on Much every few hours for the next month or so. We weren’t famous, but if this kind of thing had happened 10 years later, and if it was driven by social networks, I’m pretty sure we’d have called it viral. This was as close to viral you could get (without already being a celebrity, or doing something horribly newsworthy) in 1998.

There’s not much online about these events. I kept poor records back then. (-; I did find someone’s portfolio site about the event, and a video from another group that was entered in 1998 but didn’t win. Here are some newspaper clippings and a print + scan from our school’s Geocities page (thanks to my mom for clipping these way back then). Last thing: I found my official recognition certificate.

Certificate of Recognition … presented to Sean Coates … for Your winning entry in the 1998 Stop Racism National Video Competition

I learned a lot from this experience, but I think the biggest lesson was: if you’re any good at all, just try because you might succeed. I didn’t yet know the second part.

2001-2005

The second part of the lesson was revealed to me in 2004 or 2005, but let’s step back a little bit in time.

There are a few main events that put me on the path to learning what I learned. The first was when I took a new job in mid-2001, and I met Kevin Yank (who was living in Montreal at the time, and was just finishing up working at the place that had hired me—we became friends for a while there, too, until he moved away and we lost touch other than an occasional encounter in the Fediverse these days). Kev had just published a book with SitePoint: Build Your Own Database Driven Website Using PHP & MySQL. I learned a lot of PHP & MySQL from that book (I was working with Coldfusion at the time), and I still have my copy.

My copy of the aforementioned book.

What really struck me, though, was that he was close to my age, and wanted something more from his career, so he wrote this book. I thought I wanted that—or at least something like that—for my own career.

A few months later, I signed up to volunteer with the PHP documentation team and I remember it being a really big deal to me when they gave me my sean@php.net email address.

In 2003 (I think), I attended the first Conférence PHP Québec where I met Rasmus and many other people who would become peers in the PHP community. This conference eventually became ConFoo.

In late 2003 I decided I wanted to write for php|architect Magazine. I had a topic I liked and really wanted to see if I could—and still wanted to—build on that idea Kevin had imprinted on me. I did not expect it to be accepted—it seemed so prestigious. But my idea and draft/outline were accepted, and I was told I needed to write 4000 words, which—for a 23 year old non-academic-person—was a LOT of words (even this far-too-long blog post is “only” around 2000 words). But I did it. I was ecstatic to have my piece published in the January 2004 issue.

It was either later that year or in 2005 that I ran into the publisher of the magazine, Marco Tabini on IRC where we’d both been hanging out with PHP people for some time. He’d just lost his Editor-in-Chief, and was venting about having to pick up the editing duties in addition to his regular work. I—oh so naïvely—suggested that “I like editing” and he asked if I wanted to do it. After he reviewed an editing sample exercise he gave me, I started learning how to become the EiC and picked up the role pretty quickly.

So here’s where all of this has been going: when I started editing the magazine, I got to see our content pipeline. We had to run four 4000 word main articles per month, in addition to the columns, and what I saw blew my mind. I’d come into this believing that it was the cream of the crop that wrote for this trade magazine. It was really the best people who got published. That’s what I thought. I was so proud of my own accomplishment of writing for this prestigious magazine. And you know what? Some of the articles were excellent, but more often than not, I had to scrape together barely enough content to make each month’s issue great (and—admittedly—sometimes not great or even all that good). I had to rewrite whole articles. We had to beg past authors to write again. The illusion of prestige was completely revealed to me.

And this… this is the secret I’ve learned: if you’re good at something, you’re probably better than most everyone else who does that something. There’s always going to be the top tier, sure, and you might be in that top tier, or you might not, but the average is shockingly average. It’s really not that hard for you to accomplish many of these things if you set reasonable goals, and it turns out the bar for some of those goals is much lower than I expected early in my career.

--

Tl;DR: If you want to do something and think you can do it: just do it. If you’re any good at all, you’re probably better than most people who’ve done it, and far better than those who won’t even try.

Modified Microphone

I've owned a Blue Yeti microphone for a little over five years, now. It's a pretty decent microphone for video calls, and I really like that it has its own audio interface for (pass-through) earphones, and a dedicated hardware mute feature. I've used it on probably 1500 calls, with only one complaint: the mute button makes noise.

For most of that five years, my colleagues could usually tell when I wanted into the conversation because the mute button has a satisfying ka-chunk tactile feedback that—unfortunately—transfers vibration into the thing to which it is attached, and that thing is a highly-sensitive vibration transducing machine… a microphone!

The ka-chunk noise has bothered me for years. Not so much for the signifier that I'm about to speak, but that I couldn't unmute myself without people expecting me to speak. Plus, it's kind of fundamentally annoying to me that a microphone has a button that makes noise.

So, I set out to fix this. I've been playing with these ESP32 microcontrollers from Espressif. These inexpensive chips (and development boards) are full of features and are way overpowered compared to the first-generation Arduinos I was playing with 15 years ago. They have WiFi and Bluetooth built in, come with enough RAM to do some basic stuff (and off-chip serial-bus-accessible RAM for more intensive work), and are easy to program (both from a software and hardware standpoint).

One of the cool built-in features of the ESP32 is a capacitive touch sensor. There are actually several of these sensors on board, and they're often used to sense touch for emulating buttons… silent buttons. You can see where I'm going with this.

I laid out a test circuit on a breadboard, and wrote a little bit of firmware (including some rudimentary debouncing) using the Arduino framework, then tested:

(This is on a ESP32 WROOM development board that's not ideal for some of my other projects, where I prefer the WROVER for the extra RAM, but is ideal—if not serious overkill—for this project.)

Not bad. But now I had to do the hard part: figure out how the microphone handles that button press. I looked around the Internet a little for someone who'd already done something similar, and I found some teardown videos, but couldn't track down a schematic.

I took the microphone apart, dug into the Yeti's board, and found the button. It was a bit more complicated than I'd imagined, mostly because the button is both illuminated (the built-in and light-piped LED will flash when muted, and will be lit solidly when unmuted), and mode-less (it's a momentary button). With some creative probing with a volt-meter and some less-than-ideal hotwiring of the +5V/GND provided by the USB interface, I tracked the button press down to a single pin on the switch, which was sunk low when the button is pressed. I soldered on a wire to use with my project:

I also soldered on a way-too-big barrel connector to tap into the USB interface's +5V and Ground lines. (Use what you have on-hand, right? Try not to get the connector genders backwards like I did… and also maybe solder better than me.)

My code would need to be modified to simulate this button "press". In addition to the debouncing, I'd have to pretend to press and release the button, and also instead of providing +5 volts to an output pin (the normal way to signal something like this), I'd actually have to sink the 5V to ground. Here's the (Arduino framework) code I ended up with (including some Serial debuggery):

#include <Arduino.h>

#define TOUCH_PIN 4
#define LED_PIN 2
#define EXT_LED_PIN 15

#define PULSE_DELAY 500

unsigned int threshold = 20;

// the last time the output pin was toggled:
unsigned long lastDebounceTime = 0;
// the debounce time
unsigned long debounceDelay = 500;

unsigned long pulseStartTime = 0;
bool toggledLow = false;

void gotTouch() {
  if (millis() < (lastDebounceTime + debounceDelay)) {
    return;
  }
  lastDebounceTime = millis();
  Serial.println("Touched.");

  // pulse the button
  digitalWrite(LED_PIN, LOW);
  digitalWrite(EXT_LED_PIN, LOW);
  Serial.println("(low)");
  pulseStartTime = millis();
  toggledLow = true;

}

void setup() {
  Serial.begin(9600);
  delay(100);
  Serial.println("Started.");
  pinMode(LED_PIN, OUTPUT);
  pinMode(EXT_LED_PIN, OUTPUT);
  digitalWrite(LED_PIN, HIGH);
  digitalWrite(EXT_LED_PIN, HIGH);
  touchAttachInterrupt(T0, gotTouch, threshold);
}

void loop() {
  // Touch0, T0, is on GPIO4
  Serial.println(touchRead(T0));  // get value using T0
  Serial.println("");
  delay(100);
  if (toggledLow && (millis() > (pulseStartTime + PULSE_DELAY))) {
    digitalWrite(LED_PIN, HIGH);
    digitalWrite(EXT_LED_PIN, HIGH);
    toggledLow = false;
    Serial.println("(high)");
  }
}

Please at least attempt to refrain from making fun of my weak C++ skills… but this seems to be surprisingly stable code in practice.

Now, I'd need to attach the ESP32 dev board, and reassemble the microphone part-way. The case of the Yeti is cast aluminum (or another softish metal, but I assume aluminum). This means that I could maybe—hopefully—use the case of the the Yeti itself as the touch sensor. I rigged up a sensor wire to loosely connect to the mounting hole (which gets a thumb-screwed bolt, and will, by force, make a good connection to the case), since it's a huge pain (at best) to solder to aluminum:

Then, some bench testing before really putting it back together: it works! (You can see the blinking light in the middle of the Yeti's board go solid and back to blinking when I touch the unassembled-but-connected case.)

Great! Success! I managed to do that mostly in a single evening! I put the microphone back together, including putting the case-mounting bolts back in and… suddenly it no longer worked. I disassembled, hooked up the serial monitor via USB, and… well… it works! Maybe I just pinched a connection or shorted a pin or something. More Kapton tape! Reassembled, and… failed again. So I ran a cable through the volume knob hole and reassembled, and tested it in-situ. Weird. The capacitance numbers are all very low. In fact, the might be always just (very near) 0 plus some occasional noise. What?

After a day or two of head-scratching, and then some measuring to confirm the hypothesis, I realized that when the bolts go into the case, the case gets connected to the chassis, and the chassis is grounded to the board, then through to the USB ground. So the case itself gets grounded. And that's bad for a floating capacitance sensor. Turns out it didn't work after all.

This led to experimentation with some insulating enamel paint for transformers, and me certainly burning through a few too many brain cells with the fumes from said paint. I gave up on isolating the case from ground (which is probably good, anyway, all things considered), and made a little touch pad out of some aluminum ducting tape, some solderable copper tape, and a chunk of cardboard, that I happened to have on hand (back to using what you have here).

Actual successful hack.

As you can see in the video, I also added a little toggle switch to the back of the microphone that could allow me to completely detach the switching line from the microphone, just in case my hack started to fail, and I was on an important call—the stock mute button still works, of course. But, I'm happy to report that it's been nothing but stable for the past few weeks—it didn't even overflow the millis() checks, somehow, which still surprises me—and I use the new ka-chunk-free touch-to-mute-unmute feature of my microphone every day.

Anova Precision Oven (after a year)

This post doesn't exactly fit the normal theme of my blog, but over the past few weeks, several people have asked me about this, so I thought it was worth jotting down a few thoughts.

In January 2021, after eyeballing the specs and possibilities for the past few months, I splurged and ordered the Anova Precision Oven. I've owned it for over a year, now, and I use it a lot. But I wish it was quite a bit better.

There were a few main features of the APO that had me interested.

First, we have a really nice Wolf stove that came with our house. The range hood is wonderful, and the burners are great. The oven is also good when we actually need it (and we do still need it, sometimes; see below), but it's propane, so there are a few drawbacks: it takes a while to heat up because there's a smart safety feature that's basically a glow plug that won't let gas flow until it's built up enough heat to ignite the gas, preventing a situation where the oven has an ideal gas-air mix and is ready to explode. It's also big. And it uses propane (which I love for the burners, but is unnecessary (mostly) for the oven, and not only is it relatively expensive to run (we have a good price on electricity in Quebec because of past investments in giant hydro-electric projects), it measurably reduces the air quality in the house if the hood fan isn't running (and running the fan in the dead of winter or summer cools/heats the house in opposition to our preference).

The second feature that had me really interested in the APO is the steam. I've tried and mostly-failed many times to get my big oven (this gas one and my previous electric oven) to act like a steam oven. Despite trying the tricks like a pan of water to act as a hydration reservoir, and spraying the walls with a mist of water, it never really steamed like I'd hoped—especially when making baguette.

I'm happy to say that the APO meets both of these needs very well: it's pretty quick to heat up—mostly because it's smaller; I do think it's under-powered (see below)—and the steam works great.

There are, however, a bunch of things wrong with the APO.

The first thing I noticed, after unpacking it and setting it up the first time, is that it doesn't fit a half sheet pan. It almost fits. I'm sure there was a design or logistics restriction (like maybe these things fit significantly more on a pallet or container when shipping), but sheet pans come in standard sizes, and it's a real bummer that I not only can't use the pans (and silicone mats) I already owned, but finding the right sized pan for the APO is also difficult (I bought some quarter and eighth sheet pans, but they don't fill up the space very well).

Speaking of the pan: the oven comes with one. That one, however, was unusable. It's made in such a way that it warps when it gets hot. Not just a little bit—a LOT. So much that if there happens to be liquid on the pan, it will launch that liquid off of the pan and onto the walls of the oven when the pan deforms abruptly. Even solids are problematic on the stock pan. I noticed other people complaining online about this and that they had Anova Support send them a new pan. I tried this. Support was great, but the pan they sent is unusable in a different way: they "solved" the warping problem by adding rigidity to the flat bottom part of the pan by pressing ribs into it. This makes the pan impossible to use for baking anything flat like bread or cookies.

I had to contact Support again a few months later when the water tank (the oven uses this for steam, but also even when steam mode is 0%, to improve the temperature reading by feeding some of the water to the thermometer, in order to read the "wet bulb" temperature). The tank didn't leak, but the clear plastic cracked in quite a large pattern, threatening to dump several litres of water all over my kitchen at any moment. Support sent me a new tank without asking many questions. Hopefully the new one holds up; it hasn't cracked yet, after ~3 months.

Let's talk about the steam for a moment: it's great. I can get a wonderful texture on my breads by cranking it up, and it's perfect for reheating foods that are prone to drying out, such as mac & cheese—it's even ideal to run a small amount of steam for reheating pizza that might be a day or two too old. I rarely use our microwave oven for anything non-liquid (melting butter, reheating soups), and the APO is a great alternative way to reheat leftovers (slower than the microwave, sure, but it doesn't turn foods into rubber, so it's worth trading time for texture).

So it's good for breads? Well, sort of. The steam is great for the crust, definitely. However, it has a couple problems. I mentioned above that it's under-powered, and what I mean by that is two-fold: it has a maximum temperature of 250°C (482°F), and takes quite a long time to recover from the door opening—like, 10 minutes long. Both of these are detrimental to making an ideal bread. I'd normally bake bread at a much higher temperature—I do 550°F in the big oven, and pizza even hotter (especially in the outdoor pizza oven which easily gets up to >800°F). 482°F is—at least in my casual reasoning—pretty bad for "oven spring". My baguettes look (and taste) great, but they're always a bit too flat. The crust forms, but the steam bubbles don't expand quite fast enough to get the loaf to inflate how I'd like. The recovery time certainly doesn't help with this, either. I've managed to mitigate the slow-reheat problem by stacking a bunch of my cast iron pans in the oven to act as a sort of thermal ballast, and help the oven recover more quickly.

Also on the subject of bread: the oven is great for proofing/rising yeast doughs. Well, mostly great. It does a good job of holding the oven a bit warmer than my sometimes-cold-in-winter kitchen, and even without turning on the steam, it seems to avoid drying out the rising dough. I say "mostly" because one of the oven's fans turns on whenever the oven is "on", even at low temperatures. The oven has a pretty strong convection fan which is great, but this one seems to be the fan that cools the electronics. I realize this is necessary when running the oven itself, but it's pretty annoying for the kitchen to have a fairly-loud fan running for 24-48+ hours while baguette dough is rising at near-ambient temperatures.

The oven has several "modes" where you can turn on different heating elements inside the oven. The main element is the "rear" one, which requires convection, but there's a lower-power bottom element that's best for proofing, and a top burner that works acceptably (it's much less powerful than my big gas oven, for example) for broiling. One huge drawback to the default rear+convection mode, though, is that the oven blows a LOT of bubbling liquid all over the place when it's operating. This means that it gets really dirty, really quickly (see the back wall in the photo with the warped pan, above). Much faster than my big oven (even when running the convection fan over there). This isn't the end of the world, but it can be annoying.

The oven has controls on the door, as well as an app that works over WiFi (locally, and even when remote). I normally don't want my appliances to be in the Internet (see Internet-Optional Things), but the door controls are pretty rough. The speed-up/slow-down algorithm they use when holding the buttons for temperature changes is painful. It always overshoots or goes way too slow. They've improved this slightly, with a firmware update, but it's still rough.

The app is a tiny bit better, but it has all of the problems you might expect from a platform-agnostic mobile app that's clearly built on a questionable web framework. The UI is rough. It always defaults to the wrong mode for me (I rarely use the sous-vide mode), and doesn't seem to allow things like realtime temperature changes without adding a "stage" and then telling the oven to go to that stage. It's also dangerous: you can tell the app to turn the oven on, without any sort of "did one of the kids leave something that's going to catch fire inside the oven" interlock. I'd much prefer (even as optional configuration) a mode where I'm required to open and close the door within 90 seconds of turning the oven on, or it will turn off, or something like that.

Speaking of firmware… one night last summer, while I was sitting outside doing some work, my partner sent me a message "did you just do something to the oven? it keeps making the sound like it's just turned on." I checked the app and sure enough, it just did a firmware update. I told her "it's probably just restarted after the firmware update." When I went inside a little while later, I could hear it making the "ready" chime over and over. Every 10-15 seconds or so. I didn't realize this is what she'd meant. I tried everything to get it to stop, but it was in a reboot loop. We had to unplug it to save our sanity. Again, I looked online to see if others were having this issue, and sure enough, there were thousands of complaints about how everyone's ovens were doing this same thing. Some people were about to cook dinner, others had been rising bread for cooking that night, but we all had unusable ovens. They'd just reboot over and over, thanks to a botched (and automatic!) firmware update. Anova fixed this by the next morning, but it was a good reminder that software is terrible, and maybe our appliances shouldn't be on the Internet. (I've since put it back online because of the aforementioned door controls and the convenience of the—even substandard—app. I wish we could just use the door better, though.)

So, should you buy it? Well, I don't know. Truthfully, I'm happy we have this in our house. It's definitely become our main oven, and it fits well in our kitchen (it's kind of big, but we had a part of the counter top that turned out perfect for this). It needs its own circuit, really, and is still underpowered at 120V (~1800W). However, I very very often feel like I paid a lot of money to beta test a product for Anova (it was around the same price as I paid for my whole slide-in stove (oven + burners, "range"), at the previous house), and that's a bummer.

If they announce a Version 2 that fixes the problems, I'd definitely suggest getting that, or even V1 if you need it sooner, and are willing to deal with the drawbacks—I just wish you didn't have to.

A USB Mystery

TL;DR: got some hardware that didn't have a driver I could use. Dived into some packet captures, learned some USB, wrote some code.

I've been on a mini quest to instrument a bunch of real-life things, lately.

One thing that's been on my mind is "noise" around my house. So, after a very small amount of research, I bought a cheap USB Sound Pressure Level ("SPL") meter. (I accidentally bought the nearly-same model that was USB-powered only (it had no actual USB connection), before returning it for this one, so be careful if you happen to find yourself on this path.) Why not use a regular microphone attached to a regular sound device? Calibration.

When the package arrived, and I connected it, I found that it was not the same model from my "research" above. I was hoping that many/most/all of these meters had the same chipset. So now I had a little bit of a mystery: how do I get data from this thing?

I managed to find it in my system devices (on Mac; I'd have used lsusb on Linux—this thing will eventually end up on a Raspberry Pi, though):

❯ system_profiler SPUSBDataType
(snip)
WL100:

  Product ID: 0x82cd
  Vendor ID: 0x10c4  (Silicon Laboratories, Inc.)
  Version: 0.00
  Speed: Up to 12 Mb/s
  Manufacturer: SLAB
  Location ID: 0x14400000 / 8
  Current Available (mA): 500
  Current Required (mA): 64
  Extra Operating Current (mA): 0
(snip)

So, I at least know it actually connects to the computer and identifies itself. But I really had no idea where to go from there. I found that Python has a PyUSB library, but even with that set up, my Mac was unhappy that I'd try accessing USB devices from userspace (non-sudo). I found there was also another way to connect to devices like this, over "HID", which is the protocol normally used for things like the keyboard/mouse, but is over-all a simpler way to connect things.

The vendor supplied software on a mini-CD. Hilarious. There was also a very sketchy download link for Windows-only software. I have a Windows box in the networking closet for exactly this kind of thing (generally: testing of any sort). So, I went looking for some USB sniffing software, and a friend remembered that he thought Wireshark could capture USB. Perfect! I'd used Wireshark many times to debug networking problems, but never for USB. This was a lead nonetheless.

I fired up the vendor's software and connected the SPL meter:

Okay. It's ugly, but it seems to work. This app looks like it's from the Win32 days, and I thought that was no longer supported… but it works—or at least seems to. I asked Wireshark to capture on USBPcap1, and waited until I saw it update a few times. Disconnected the capture, saved the pcap session file, and loaded it into Wireshark on my main workstation. Unfortunately, I didn't have much of an idea what I was looking at.

I could, however, see what looked like the conversation between the computer (host), and the SPL meter (1.5.0). The was marked USBHID (as opposed to some other packets marked only USB), so this was a great clue:

The led to some searches around GET_REPORT and USB/HID/hidapi. Turns out that USB HID devices have "endpoints", "reports", and a lexicon of other terms I could only guess about. I didn't plan to become a full USB engineer, and was hoping I could squeeze by with a bit of mostly-naïve-about-USB-itself-but-otherwise-experienced analysis.

Eventually, I figured out that I can probably get the data I want by asking for a "feature report". Then I found get_feature_report in the Python hidapi bindings.

This function asks for a report_num and max_length:

def get_feature_report(self, int report_num, int max_length):
    """Receive feature report.

    :param report_num:
    :type report_num: int
    :param max_length:
    :type max_length: int
    :return: Incoming feature report
    :rtype: List[int]
    :raises ValueError: If connection is not opened.
    :raises IOError:
    """
    

These two values sound familiar. From the Wireshark capture:

Now I was getting somewhere. Let's use that decoded ReportID of 5 and a max_length (wLength) of 61.

import hid
import time

h = hid.device()
# these are from lsusb/system_profiler
h.open(0x10C4, 0x82CD)

while True:
    rpt = h.get_feature_report(5, 61)
    print(rpt)
    time.sleep(1)

This gave me something like:

[5, 97, 239, 60, 245, 0, 0, 1, 85, 0, 0, 1, 44, 5, 20, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[5, 97, 239, 60, 246, 0, 0, 1, 99, 0, 0, 1, 44, 5, 20, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[5, 97, 239, 60, 247, 0, 0, 1, 172, 0, 0, 1, 44, 5, 20, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[5, 97, 239, 60, 248, 0, 0, 3, 63, 0, 0, 1, 44, 5, 20, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[5, 97, 239, 60, 249, 0, 0, 2, 168, 0, 0, 1, 44, 5, 20, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[5, 97, 239, 60, 250, 0, 0, 1, 149, 0, 0, 1, 44, 5, 20, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[5, 97, 239, 60, 251, 0, 0, 1, 71, 0, 0, 1, 44, 5, 20, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

I played around with this data for a bit, and eventually noticed that the 8th and 9th (rpt[7:9]) values were changing. Sure enough, if I made a noise, the 9th value would change, and if it was a loud noise, the 8th value would also change:

1, 85
1, 99
1, 172
3, 63
2, 168

I was about to start throwing data into a spreadsheet when I made a guess: what if that's a 16 (or 12…) bit number? So, if I shift the first byte over 8 bits and add the second byte…

(1 << 8) + 85 == 341
(1 << 8) + 99 == 355
(1 << 8) + 172 == 428
(3 << 8) + 64 == 831
(2 << 8) + 167 == 680 

The meter claims to have a range of 30dBA to 130dBA, and it sits around 35dBA when I'm not intentionally making any noise, in my office with the heat fan running. Now I'm worried that it's not actually sharing dBA numbers and maybe they're another unit or… wait… those ARE dBA numbers, just multiplied to avoid the decimal! 34.1, 35.5, 42.8, 83.1, 68.0

Got it!

Anyway, I wrote some (better) code to help read this data, on Python: scoates/wl100 on GitHub. Let me know if you use it!

IoT: Internet-Optional Things

I both love and hate the idea of "smart" devices in my home. It's tough to balance the convenience of being able to turn lights on and off automatically, and adjust thermostats with my phone, with the risk that all of my devices are doing evil things to my fellow Internet citizens. But, I think I've landed on a compromise that works.

I've had Internet-connected devices for a long time now. I've even built devices that can go online. At some point a year or two ago, I realized that I could do better than what I had. Here's a loose list of requirements I made up for my own "IoT" setup at home:

  • works locally as a primary objective
  • works when my Internet connection is down or slow
  • avoids phoning home to the vendor's (or worse: a third party's) API or service
  • can be fully firewalled off from the actual Internet, ideally through a physical limitation
  • isn't locked up in a proprietary platform that will either become expensive, limited, or will cease to exist when it's no longer profitable

My setup isn't perfect; it doesn't fully meet all of these criteria, but it's close, and it's been working well for me.

At the core of my home IoT network is a device called Hubitat Elevation. It works as a bridge between the actual Internet, and my devices which are actually incapable (for the most part) of connecting to the Internet directly. My devices, which range from thermostats, to lights, to motion sensors, to switchable outlets, and more, use either Zigbee or Z-Wave to communicate with each other (they form repeating mesh networks automatically) and with the hub. Again, they don't have a connection to my WiFi or my LAN, except through the hub, because they're physically incapable of connecting to my local network (they don't have ethernet ports, nor do they have WiFi radios). The hub brokers all of these connections and helps me control and automate these devices.

The hub—the Hubitat Elevation—is fairly inexpensive, and is not fully "open" (as I'd like), but has good integration abilities, is well-maintained, is compatible with many devices (many of them are devices compatible with the more-proprietary but similar SmartThings hub), and has an active community of people answering questions, coming up with new ideas, and maintaining add-ons. These add-ons are written in Groovy, which I hadn't really used in earnest before working with the Hubitat, but you can write and modify them to suit your needs.

The hub itself is mostly controlled through a web UI, which I'll admit is clunky, or through a mobile app. The mobile app adds capabilities like geo-fencing, presence, and notifications. The hub can also be connected to other devices; I have mine connected to my Echo, for example, so I can say "Alexa turn off the kitchen lights."

The devices themselves are either mains-powered (such as my Hue lightbulbs, baseboard thermostats, and switching outlets), or are battery powered (such as motion sensors, door contact switches, and buttons). Many of these devices also passively measure things like local temperature, and relay this data, along with battery health to the hub.

I'm going to get into some examples of how I have things set up, here, though not a full getting-started tutorial, but first I wanted to mention a few things that were not immediately obvious to me, and will get you off on the right foot, if you choose to follow a similar path to mine.

  • Third-party hub apps are a bit weird in their structure (there are usually parent and child apps), and keeping them up to date can be a pain. Luckily, Hubitat Package Manager exists, and many add-ons can be maintained through this useful tool.
  • There's a built-in app called "Maker API" which provides a REST interface to your various devices, which technically goes against one of my loose requirements above, but I have it limited to the LAN, and authenticating, and this feels like a fair trade-off to me for when I want to use this kind of connection.
  • There's an app that will send measured data to InfluxDB, which is a timeseries database that I have running locally on my SAN (as a Docker container on my Synology DSM), and it works well as a data source for Grafana (the graphs in this post come from Grafana).

Programmable Thermostats

My house is heated primarily through a centralized heat pump (which also provides cooling in the summer), but many rooms have their own baseboard heaters + independent thermostats. Before automation, these thermostats were either completely manual, or had a hard-to-manage on-device scheduling function.

I replaced many of these thermostats with connected versions. My main heat pump's thermostat (low voltage) is the Honeywell T6 Pro Z-Wave, and my baseboard heaters (line voltage) are now controlled with Zigbee thermostats from Sinopé.

Managing these through the web app is much better than the very limited UI available on programmable thermostats, directly. The Hubitat has a built-in app called "Thermostat Scheduler." Here's my office, for example (I don't like cold mornings (-: ):

Lighting

An often-touted benefit of IoT is lighting automation, and I have several lights I control with my setup. Much of this is through cooperation with the Hue bridge, which I do still have on my network, but I could remove at some point, since the bulbs speak Zigbee. The connected lights that are not Hue bulbs are mostly controlled by Leviton Decora dimmers, switches, and optional dimmer remotes for 3-way circuits. Most of this is boring/routine stuff such as "turn on the outdoor lighting switch at dusk and off at midnight," configured on the hub with the "Simple Automation Rules" app, but I have a couple more interesting applications.

Countertop

My kitchen counter is long down one side—a "galley" style. There's under-cabinet countertop lighting the whole length of the counter, but it's split into two separate switched/dimmed circuits of LED fixtures—one to the left of the sink and one to the right. I have these set to turn on in the morning and off at night. It's kind of annoying that there are two dimmers that work independently, though, and I find it aesthetically displeasing when half of the kitchen is lit up bright and the other half is dim.

Automation to the rescue, though. I found an app called Switch Bindings that allows me to gang these two dimmers together. Now, when I adjust the one on the left, the dimmer on the right matches the new brightness, and vice versa. A mild convenience, but it sure is nice to be able to effectively rewire these circuits in software.

Cellar

I have an extensive beer cellar that I keep cool and dark most of the time. I found myself sometimes forgetting to turn off the lights next to the bottles, and—as someone who is highly sensitive to mercaptans/thiols (products of lightstuck beers, a "skunky" smell/fault)—I don't want my beer to see any more light than is necessary.

With my setup, I can have the outlet that my shelf lighting is plugged into turn on and off when the door is opened or closed. There's also a useful temperature sensor and moisture sensor on the floor so I can know quickly if the floor drain backs up, or if a bottle somehow breaks/leaks enough for the sensor to notice, via the notification system, and keep track of cellar temperature over time.

these lights turn on and off when the door is opened and closed, respectively

I also receive an alert on my phone when the door is opened/closed, which is increasingly useful as the kids get older.

Foyer

Our house has an addition built onto the front, and there's an entrance room that is kind of separated off from the rest of the living space. The lighting in here has different needs from elsewhere because of this. Wouldn't it be nice if the lights in here could automatically turn on when they need to?

Thanks to Simple Automation Rules (the built-in app), and a combination of the SmartThings motion sensor and the DarkSky Device Driver (which will need to be replaced at some point, but it still works for now), I can have the lights in there—in addition to being manually controllable from the switch panels—turn on when there's motion, but only if it's dark enough outside for this to be needed. The lights will turn themselves off when there's no more motion.

Ice Melting

We have a fence gate that we keep closed most of the time so Stanley can safely hang out in our backyard. We need to use it occasionally, and during the winter this poses a problem because that side of the house has a bit of water runoff that is not normally a big deal, but in the winter, it sometimes gets dammed up by the surrounding snow/ice and freezes, making the gate impossible to open.

In past winters, I've used ice melting chemicals to help free the gate, but it's a pain to keep these on hand, and they corrode the fence posts where the powder coating has chipped off. Plus, it takes time for the melting to work and bags of this stuff are sometimes hard to find (cost aside).

This year, I invested in a snow melting mat. Electricity is relatively cheap here in Quebec, thanks to our extensive Hydro-Electric investment, but it's still wasteful to run this thing when it's not needed (arguably still less wasteful than bag after bag of ice melter). I'm still tweaking the settings on this one, but I have the mat turn on when the temperature drops and off when the ambient temperature is warmer. It's working great so far:

Desk foot-warming mat

My office is in the back corner of our house. The old part. I suspect it's poorly insulated, and the floor gets especially cold. I bought a warming mat on which to rest my feet (similar to this one). It doesn't need to be on all of the time, but I do like to be able call for heat on demand, and have it turn itself off after a few minutes.

I have the mat plugged into a switchable outlet. In the hub, I have rules set up to turn this mat on when I press a button on my desk. The mat turns itself off after 15 minutes, thanks to a second rule in the built-in app "Rule Machine". Warm toes!

When I first set this up, I found myself wondering if the mat was already on. If I pressed the button and didn't hear a click from the outlet's relay, I guessed it was already on. But the hub allows me to get a bit more insight. I didn't want something as distracting (and redundant) as an alert on my phone. I wanted something more of an ambient signifier. I have a Hue bulbed lamp on my desk that I have set up to tint red when the mat is on, and when it turns off, to revert to the current colour and brightness of another similar lamp in my office. Now I have a passive reminder of the mat's state.

Graphs

An additionally interesting aspect of all of this (to me as someone who uses this stuff in my actual work, anyway) is that I can get a visual representation of different sensors in my house, now that we have these non-intrusive devices.

For example, you can see here that I used my office much less over the past two weeks (both in presence and in the amount I used the foot mat), since we took a much-needed break (ignore the CO2 bits for now, that's maybe a separate post):

As I mentioned on Twitter a while back, a graph helped me notice that a heating/cooling vent was unintentionally left open when we switched from cooling to heating:

Or, want to see how well that outdoor mat on/off switching based on temperature is working?

An overview of the various temperatures in my house (and outside; the coldest line) over the past week:

Tools

What's really nice about having all of this stuff set up, aside from the aforementioned relief of it not being able to be compromised directly on the Internet is that I now have tools that I can use within this infrastructure. For example, when we plugged in the Christmas tree lights, this year, I had the outlet's schedule match the living room lighting, so it never gets accidentally left on overnight.

Did it now

I originally wrote this one to publish on Reddit, but also didn't want to lose it.

Many many years ago, I worked at a company in Canada that ran some financial services.

The owner was the kind of guy who drove race cars on weekends, and on weekdays would come into the programmers' room to complain that our fingers weren't typing fast enough.

On a particularly panicky day, one of the web servers in the pool that served our app became unresponsive. We had these servers hosted in a managed rack at a hosting provider offsite. After several hours of trying to bring it back, our hosting partner admitted defeat and declared that they couldn't revive WEB02. It had a hardware failure of some sort. We only had a few servers back then, and they were named according to their roles in our infrastructure: WEB01, WEB02, CRON01, DB03, etc.

Traffic and backlog started piling up with WEB02 out of the cluster, despite our efforts to mitigate the loss (which we considered temporary). Our head of IT was on the phone with our hosting provider trying to come up with a plan to replace the server. This was before "cloud" was a thing and each of our resources was a physically present piece of hardware. The agreed-upon solution was to replace WEB02 with a new box, which they were rushing into place from their reserve of hardware, onsite.

By this point, the race-car-driving, finger-typing-speed-complaining owner of the company was absolutely losing it. It seemed like he was screaming at anyone and everyone who dared make eye contact, even if they had truly nothing to do with the server failure or its replacement.

Our teams worked together to get the new box up and running in record time, and were well into configuring the operating system and necessary software when they realized that no one wanted to go out on a limb and give the new machine a name. President Screamy was very particular about these names for some reason and this had been the target of previous rage fests, so neither the hosting lead nor our internal soldiers wanted to make a decision that they knew could be deemed wrong and end up the target of even more yelling. So, they agreed that the hosting provider would call the CEO and ask him what he'd like to name the box.

But before that call could be made, the CEO called our hosting provider to tear them up. He was assured that the box was almost ready, and that the only remaining thing was whether to name it WEB02 to replace the previous box or to give it a whole new name like WEB06. Rage man did not like this at all, and despite being at the other end of the office floor from his office, we could all hear him lay fully into the otherwise-innocent phone receiver on the other end: "I just need that box up NOW. FIX IT. I don't care WHAT you call it! It just needs to be live! DO IT NOW!"

And that, friends, is how we ended up with a web pool of servers named WEB01, WEB03, WEB04, WEB05, and (the new server) DOITNOW. It also served well as a cautionary tale for new hires who happened to notice.

Cache-Forever Assets

I originally wrote this to help Stoyan out with Web Performance Calendar; republishing here.

A long time ago, we had a client with a performance problem. Their entire web app was slow. The situation with this client's app was a bit tricky; this client was a team within a very large company, and often—in my experience, anyway—large companies mean that there are a lot of different people/teams who exert control over deployed apps and there's a lot of bureaucracy in order to get anything done.

The client's team that had asked us to help with slow page loads only had passive access to logs (they couldn't easily add new logging), and they were mostly powerless to do things like optimize SQL queries, of which there were logs already, and really only controlled the web app itself, which was a very heavy Java/Spring-based app. The team we were working with knew just enough to maintain the user-facing parts of the app.

We, a contracted team brought in to help with guidance (and we did eventually build some interesting technology for this client), had no direct ability to modify the deployed app, nor did we even get access to the server-side source code. But we still wanted to help, and the client wanted us to help, given all of these constraints. So, we did a bit of what-we-can-see analysis, and came up with a number of simple, but unimplemented optimizations. "Low-hanging fruit" if you will.

These optimizations included things like "improve the size of these giant images (and here's how to do it without losing any quality)", "concatenate and minify these CSS and JavaScript assets" (the app was headed by a HTTP 1.x reverse proxy), and "improve user-agent caching". It's the last of these that I'm going to discuss here.

Now, before we get any deeper into this, I want to make it clear that the strategy we implemented (or, more specifically: advised the client to implement) is certainly not ground-breaking—far from it. This client, whether due to geographic location, or perhaps being shielded from outside influence within their large corporate infrastructure, had not implemented even the most basic of browser-facing optimizations, so we had a great opportunity to teach them things we'd been doing for years—maybe even decades—at this point.

We noticed that all requests were slow. Even the smallest requests. Static pages, dynamically-rendered for the logged-in user pages, images, CSS, even redirects were slow. And we knew that we were not in a position to do much about this slowness, other than to identify it and hope the team we were in contact with could request that the controlling team look into the more-general problem. "Put the assets on a CDN and avoid the stack/processing entirely" was something we recommended but it wasn't even something we could realistically expect to be implemented given the circumstances.

"Reduce the number of requests" was already partially covered in the "concatenate and minify" recommendation I mentioned above, but we noticed that because all requests were slow, the built-in strategy of using the stack's HTTP handler to return 304 not modified if a request could be satisfied via Last-Modified or ETag was, itself, sometimes taking several seconds to respond.

A little background: normally (lots of considerations like cache visibility glossed over here), when a user agent makes a request for an asset that it already has in its cache, it tells the server "I have a copy of this asset that was last modified at this specific time" and the server, once it sees that it doesn't have a newer copy, will say "you've already got the latest version, so I'm not going to bother sending it to you" via a 304 Not Modified response. Alternatively, a browser might say "I've got a copy of this asset that you've identified to have unique properties based on this ETag you sent me; here's the ETag back so we can compare notes" and the server will—again, if the asset is already current—send back a 304 response. In both cases, if the server has a newer version of the asset it will (likely) send back a 200 and the browser will use and cache a new version.

It's these 304 responses that were slow on the server side, like all other requests. The browser was still making the request and waiting a (relatively) long time for the confirmation that it already had the right version in its cache, which it usually did.

The strategy we recommended (remember: because we were extremely limited in what we expected to be able to change) was to avoid this Not Modified conversation altogether.

With a little work at "build" time, we were able to give each of these assets, not only a unique ETag (as determined by the HTTP dæmon itself), but a fully unique URL, based on its content. By doing so, and setting appropriate HTTP headers (more on the specifics of this below), we could tell the browser "you never even need to ask the server if this asset is up to date. We could cache "forever" (in practice: a year in most cases, but that was close enough for the performance gain we needed here).

Fast forward to present time. For our own apps, we do use a CDN, but I still like to use this cache-forever strategy. We now often deploy our main app code to AWS Lambda, and find ourselves uploading static assets to S3, to be served via CloudFront (Amazon Web Services' CDN service).

We have code that collects (via either a pre-set lookup, or by filesystem traversal) the assets we want to upload. We do whatever preprocessing we need to do to them, and when it's time to upload to S3, we're careful to set certain HTTP headers that indicate unconditional caching for the browser:

def upload_collected_files(self, force=False):
    for f, dat in self.collected_files.items():

        key_name = os.path.join(
            self.bucket_prefix, self.versioned_hash(dat["hash"]), f
        )

        if not force:
            try:
                s3.Object(self.bucket, key_name).load()
            except botocore.exceptions.ClientError as e:
                if e.response["Error"]["Code"] == "404":
                    # key doesn't exist, so don't interfere
                    pass
                else:
                    # Something else has gone wrong.
                    raise
            else:
                # The object does exist.
                print(
                    f"Not uploading {key_name} because it already exists, and not in FORCE mode"
                )
                continue

        # RFC 2616:
        # "HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future"
        headers = {
            "CacheControl": "public,max-age=31536000,immutable",
            "Expires": datetime.today() + timedelta(days=365),
            "ContentType": dat["mime"],
            "ACL": "public-read",
        }

        self.upload_file(
            dat["path"],
            key_name,
            self.bucket,
            headers,
            dry_run=os.environ.get("DRY_RUN") == "1",
        )

The key name (which extends to the URL) is a shortened representation of a file's contents, plus a "we need to bust the cache without changing the contents" version on our app's side, followed by the asset's natural filename, such as (the full URL): https://static.production.site.faculty.net/c7a1f31f4ed828cbc60271aee4e4f301708662e8a131384add7b03e8fd305da82f53401cfd883d8b48032fb78ef71e5f-2020101000/images/topography-overlay.png

This effectively tells S3 to relay Cache-Control and Expires headers to the browser (via CloudFront) to only allow the asset to expire in a year. Because of this, the browser doesn't even make a request for the asset if it's got it cached.

We control cache busting (such as a new version of a CSS, JS, image, etc.) completely via the URL; our app has access (via a lookup dictionary) to the uploaded assets, and can reference the full URL to always be the latest version.

The real beauty of this approach is that the browser can entirely avoid even asking the server if it's got the latest version—it just knows it does—as illustrated here:

Developer tools showing "cached" requests for assets on faculty.com

How I helped fix Canadaʼs COVID Alert app

On July 31st, Canada's COVID Alert app was made available for general use, though it does not have support for actually reporting a diagnosis in most provinces, yet.

In Quebec, we can run the tracing part of the app, and if diagnosis codes become available here, the app can retroactively report contact. It uses the tracing mechanism that Google and Apple created together, and in my opinion—at least for now—Canadians should be running this thing to help us all deal with COVID-19. I won't run it forever, but for now, it seems to me that the benefits outweigh the "government can track me" fear (it's not actually tracking you; it doesn't even know who you are), and it's enabled on my phone.

But, before I decided to take this position and offer up my own movement data, I wanted to be sure the app is doing what it says it's doing—at least to the extent of my abilities to be duly diligent. (Note: it's not purely movement data that's shared—at least without more context—but it's actual physical interactions with other people whose phones are available within the radio range of Bluetooth LE.)

Before installing the app on my real daily-carry phone, I decided to put it on an old phone I still have, and to do some analysis on the most basic level of communication: who is it contacting?

In 2015, I gave a talk at ConFoo entitled "Inspect HTTP(S) with Your Own Man-in-the-Middle Non-Attacks", and this is exactly what I wanted to do here. The tooling has improved in the past 5 years, and firing up mitmproxy, even without ever having used it on this relatively new laptop, was a one-liner, thanks to Nix:

nix-shell -p mitmproxy --run mitmproxy

This gave me a terminal-based UI and proxy server that I pointed my old phone at (via the Wifi Network settings, under HTTP proxy, pointed to my laptop's local IP address). I needed to have mitmproxy create a Certificate Authority that it could use to generate and sign "trusted" certificates, and then have my phone trust that authority, by visiting http://mitm.it/ in mobile Safari, and doing the certificate acceptance dance (this is even more complicated on the latest versions of iOS). Worth noting also, is that certain endpoints such as the Apple App Store appear to use Certificate Pinning, so you'll want to do things like install the COVID Alert app from the App Store before turning on the proxy.

Once I was all set up to intercept my own traffic, I visited some https:// URLs and saw the request flows in mitmproxy.

I fired up the COVID Alert app again, and noticed something strange… something disturbing:

shows that the app is accessing clients.google.com

In addition to the expected traffic to canada.ca (I noticed it's using .alpha.canada.ca, but I suspect that's due to the often-reported unbearably-long bureaucratic hassle in getting a .canada.ca TLS certificate, but that's another story), my phone, when running COVID Alert, was contacting Google.

HEAD https://clients4.google.com/generate_204

A little web searching helped me discover that this is a commonly-used endpoint that helps developers determine if the device is behind a "captive portal" (an interaction that requires log-in or payment, or at least acceptance of terms before granting wider access to the Web). I decided that this was probably unintended by the developers of COVID Alert, but it still bothered me that an app, designed for tracking interactions between people['s devices], that the government wants us to run is telling Google that I'm running it, and disclosing my IP address in doing so:

shows that the User Agent header identifies the app as

(Note that the app clearly identifies itself in the User-Agent header.)

A bit more quick research turned up a statement by Canada's Privacy Commissioner:

An Internet Protocol (IP) address can be considered personal information if it can be associated with an identifiable individual. For example, in one complaint finding, we determined that some of the IP addresses that an internet service provider (ISP) was collecting were personal information because the ISP had the ability to link the IP addresses to its customers through their subscriber IDs.

It's not too difficult to imagine that Google probably has enough data on Canadians for this to be a real problem.

I discovered that this app is maintained by the Canadian Digital Service, and that the source code is on GitHub, but that the code itself didn't directly contain any references to clients3.google.com.

It's a React Native app, and I figured that the call out to Google must be in one of the dependencies, which—considering the norm with JavaScript apps—are pleasantly restrained mostly to React itself. I had no idea which of these libraries was calling out to Google.

Now, I could have run this app on the iOS Simulator (which did I end up doing to test my patches, below), but I thought "let's see what my actual phone is doing." I threw caution to the wind, and I ran checkra1n on my old phone, which gave me ssh access, which in turn allowed me to copy the app's application bundle to my laptop, where I could do a little more analysis (note the app is bundled as CovidShield because it was previously developed by volunteers at Shopify and was then renamed by CDS (or so I gather, anyway)).

~/De/C/iphone/CovidShield.app  grep -r 'clients3.google.com' *
main.jsbundle:__d(function(g,r,i,a,m,e,d){Object.defineProperty(e,"__esModule",{value:!0}),
e.default=void 0;var t={reachabilityUrl:'https://clients3.google.com/generate_204',
reachabilityTest:function(t){return Promise.resolve(204===t.status)},reachabilityShortTimeout:5e3,
reachabilityLongTimeout:6e4,reachabilityRequestTimeout:15e3};e.default=t},708,[]);

(Line breaks added for legibility.) Note reachabilityUrl:'https://clients3.google.com/generate_204. Found it! A bit more searching led me to a package called react-native-netinfo (which was directly in the above-linked package.json), and its default configuration that sets the reachabilityUrl to Google.

Now that I knew where it was happening, I could fix it.

To make this work the same way, we needed a reliable 204 endpoint that the app could hit, and to keep with the expectation that this app should not "leak" data outside of canada.ca, I ended up submitting a patch for the server side code that the app calls. (It turns out that this was not necessary after all, but I'm still glad I added this to my report.)

I also patched, and tested the app code itself via the iOS Simulator.

I then submitted a write-up of what was going wrong and why it's bad, to the main app repository, as cds-snc/covid-alert-app issue 1003, and felt pretty good about my COVID Civic Duty of the day.

The fine folks at the Canadian Digital Service seemed to recognize the problem and agree that it was something that needed to be addressed. A few very professional back-and-forths later (I'll be honest: I barely knew anything about the CDS and I expected some runaround from a government agency like this, and I was pleasantly surprised), we landed on a solution that simply didn't call the reachability URL at all, and they released a version of the app that fixed my issue!

With the new version loaded, I once again checked the traffic and can confirm that the new version of the app does not reach out to anywhere but .canada.ca.

A mitmproxy flow showing traffic to canada.ca and not google.com

New Site (same as old site)

You're looking at the new seancoates.com.

"I'm going to pay attention to my blog" posts on blogs are… passé, but…

I moved this site to a static site generator a few years ago when I had to move some server stuff around, and had let it decay. I spent most of the past week of evenings and weekend updating to what you see now.

It's still built on Nikola, but now the current version.

I completely reworked the HTML and CSS. Turns out—after not touching it in earnest in probably a decade (‼️)—that CSS is a much more pleasant experience these days. Lately, I've been doing almost exclusively back-end and server/operations work, so it was actually a bit refreshing to see how far CSS has come along. Last time I did this kind of thing, nothing seemed to work—or if it did work, it didn't work the same across browsers. This time, I used Nikola's SCSS builder and actually got some things done, including passing the Accessibility tests (for new posts, anyway) in Lighthouse (one of the few reasons I fire up Chrome), and a small amount of Responsive Web Design to make some elements reflow on small screens. When we built the HTML for the previous site, so long ago, small screens were barely a thing, and neither were wide browsers for the most part.

From templates that I built, Nikola generates static HTML, which has a few limitations when it comes to serving requests. The canonical URL for this post is https://seancoates.com/blogs/new-site-same-as-old-site. Note the lack of trailing slash. There are ways to accomplish this directly with where I wanted to store this generated HTML + assets (on S3), but it's always janky. I've been storing static sites on S3 and serving them up through CloudFront for what must be 7+ years, now, and it works great as long as you don't want to do anything "fancy" like redirects. You just have to name your files in a clever way, and be sure to set the metadata's Content-Type correctly. The file you're reading right now comes from a .md file that is compiled into [output]/blogs/new-site-same-as-old-site/index.html. Dealing with the "directory" path, and index.html are a pain, so I knew I wanted to serve it through a very thin HTTP handling app.

At work, we deploy mostly on AWS (API Gateway and Lambda, via some bespoke tooling, a forked and customized runtime from Zappa, and SAM for packaging), but all of that seemed too heavy for what amounts to a static site with a slightly-more-intelligent HTTP handler. Chalice had been on my radar for quite a while now, and this seemed like the perfect opportunity to try it.

It has a few limitations, such as horrific 404s, and I couldn't get binary serving to work (but I don't need it, since I put the very few binary assets on a different CloudFront + S3 distribution), but all of that considered, it's pretty nice.

Here's the entire [current version of] app.py that serves this site:

 import functools

 from chalice import Chalice, Response
 import boto3


 app = Chalice(app_name="seancoates")
 s3 = boto3.client("s3")
 BUCKET = "seancoates-site-content"

 REDIRECT_HOSTS = ["www.seancoates.com"]


 def fetch_from_s3(path):
     k = f"output/{path}"
     obj = s3.get_object(Bucket=BUCKET, Key=k)
     return obj["Body"].read()


 def wrapped_s3(path, content_type="text/html; charset=utf-8"):
     if app.current_request.headers.get("Host") in REDIRECT_HOSTS:
         return redirect("https://seancoates.com/")

     try:
         data = fetch_from_s3(path)
         return Response(
             body=data, headers={"Content-Type": content_type}, status_code=200,
         )
     except s3.exceptions.NoSuchKey:
         return Response(
             body="404 not found.",
             headers={"Content-Type": "text/plain"},
             status_code=404,
         )


 def check_slash(handler):
     @functools.wraps(handler)
     def slash_wrapper(*args, **kwargs):
         path = app.current_request.context["path"]
         if path[-1] == "/":
             return redirect(path[0:-1])
         return handler(*args, **kwargs)

     return slash_wrapper


 def redirect(path, status_code=303):
     return Response(
         body="Redirecting.",
         headers={"Content-Type": "text/plain", "Location": path},
         status_code=status_code,
     )


 @app.route("/")
 def index():
     return wrapped_s3("index.html")


 @app.route("/assets/css/{filename}")
 def assets_css(filename):
     return wrapped_s3(f"assets/css/{filename}", "text/css")


 @app.route("/blogs/{slug}")
 @check_slash
 def blogs_slug(slug):
     return wrapped_s3(f"blogs/{slug}/index.html")


 @app.route("/brews")
 @app.route("/shares")
 @app.route("/is")
 @check_slash
 def pages():
     return wrapped_s3(f"{app.current_request.context['path'].lstrip('/')}/index.html")


 @app.route("/archive")
 @app.route("/blogs")
 def no_page():
     return redirect("/")


 @app.route("/archive/{archive_page}")
 @check_slash
 def archive(archive_page):
     return wrapped_s3(f"archive/{archive_page}/index.html")


 @app.route("/rss.xml")
 def rss():
     return wrapped_s3("rss.xml", "application/xml")


 @app.route("/assets/xml/rss.xsl")
 def rss_xsl():
     return wrapped_s3("assets/xml/rss.xsl", "application/xml")


 @app.route("/feed.atom")
 def atom():
     return wrapped_s3("feed.atom", "application/atom+xml")

Not bad for less than 100 lines (if you don't count the mandated whitespace, at least).

Chalice handles the API Gateway, Custom Domain Name, permissions granting (for S3 access, via IAM policy) and deployments. It's pretty slick. I provided DNS and a certificate ARN from Certificate Manager.

Last thing: I had to trick Nikola into serving "pretty URLs" without a trailing slash. It has two modes, basically: /blogs/post/ or /blogs/post/index.html. I want /blogs/post. Now, avoiding the trailing slash usually invokes a 30x HTTP redirect when the HTTPd that's serving your static files needs to add it so you get the directory (index). But in my case, I was handling HTTP a little more intelligently, so I didn't want it. You can see my app.py above handles the trailing slash redirects in the wrapped_s3 function, but to get Nikola to handle this in the RSS/Atom (I had control in the HTML templates, but not in the feeds), I had to trick it with some ugliness in conf.py:

# hacky hack hack monkeypatch
# strips / from the end of URLs
from nikola import post

post.Post._unpatched_permalink = post.Post.permalink
post.Post.permalink = lambda self, lang=None, absolute=False, extension=".html", query=None: self._unpatched_permalink(
    lang, absolute, extension, query
).rstrip("/")

I feel dirty about that part, but pretty great about the rest.