Dogfooding At Armory

Apr 18, 2022 by Dan Peach

We recently onboarded the main server component of our new software delivery platform — codenamed Borealis — onto Borealis itself. I’m going to talk a bit about Borealis, the experience of dogfooding at Armory, and what we hope to learn from it.

Borealis and Deploy Engine

The application we onboarded is called Deploy Engine. The name is uninspired and is out of place among the sea of ostensibly Grecian software systems — Kubernetes, Argo, Tekton, etc. — but is descriptive of its role within Borealis: it orchestrates software deployment strategies.

Using the Borealis CLI or API, a user can provide a high-level description of their deployment: First, I’d like 10% of production traffic to go to my new software version. Next, I’d like the deployment to wait thirty minutes and then I’d like to run a canary analysis of application metrics. Next, I’d like to increase traffic to 50% — and so on. Deploy Engine, in concert with a few other subsystems, orchestrates this kind of deployment workflow.

Dogfooding at Armory

Dogfooding at Armory

In my experience, the mental state of even the most empathetic, thoughtful, and experienced software engineer is divorced from that of a user.

I have two analogies:

Dogfooding is hard, painful work, but it’s one of the best ways to experience your software as your users will. It’s not a replacement for user feedback — we have biases and beliefs that inform how we experience our software that may be different than those of our customers — but it’s an essential part of the development lifecycle.

What we’ve learned

I’ll share two anecdotes about our experience of dogfooding Deploy Engine.

When we first tried to onboard Deploy Engine onto the Borealis platform, we hit a strange bug that we’d never seen in testing or development. I soon discovered a bug that’s a little embarrassing, a little funny, and fortunately won’t take long to describe.

Our build uses Kustomize to generate manifests for five Kubernetes resources. When we tried using Borealis to deploy for the first time, only two of the five resources were deployed. We had unit tests that ensured that this case would work, so I was mystified. After some digging, I found the answer.

Our API accepts both JSON-encoded and YAML-encoded Kubernetes manifests. We check the first 512 bytes of a manifest: if those bytes look like JSON, we interpret as JSON; if not, we interpret the manifest as YAML. Unfortunately, in the case where we decided the manifest is encoded as YAML, we only used the first 512 bytes – we threw the rest away! Oops.

Our unit tests checked that we could deploy multiple manifests, but we used unrealistically small manifests — smaller than 512 bytes. This is a bug I’m glad a customer never had to experience, and is a bug we likely wouldn’t have caught without dogfooding.

My second anecdote is also brief. Borealis has a UI that lists all recent deployments; Deploy Engine determines how those deployments are sorted. Working with the product team, we defined the sort based on our experience and what we thought would make sense to customers.

Once we started using that UI as users, we realized the sort didn’t make sense at all — it bothered us, and we decided to change it. That very same day, one of our first users asked us to update the sort on the deployments page; their suggested change was the exact change we had just implemented. We were able to go back to the customer and tell them that their fix would be ready by the end of the day.

Today all the services used by Borealis are deployed using Borealis. Additional teams throughout the company have also started using it as they’ve seen it improve. Teams across Armory are giving us feedback. Some of it is tough to hear, but we’ll catch bugs and improve the product in ways that our customers will love. I hope to give you an update on Borealis, our progress, and our dogfooding journey in a few months.

If you would like to try out Borealis, we are still accepting design partners, and I’d love to hear your feedback on it as well.



Share this post:

Recently Published Posts

How to Become a Site Reliability Engineer (SRE)

Jun 6, 2023

A site reliability engineer (SRE) bridges the gap between IT operations and software development. They understand coding and the overall task of keeping the system operating.  The SRE role originated to give software developers input into how teams deploy and maintain software and to improve it to increase reliability and performance. Before SREs, the software […]

Read more

Continuous Deployment KPIs

May 31, 2023

Key SDLC Performance Metrics for Engineering Leaders Engineering leaders must have an effective system in place to measure their team’s performance and ensure that they are meeting their goals. One way to do this is by monitoring Continuous Deployment Key Performance Indicators (KPIs).  CD and Automated Tests If you’re not aware, Continuous Deployment, or CD, […]

Read more

What Are the Pros and Cons of Rolling Deployments?

May 26, 2023

Rolling deployments use a software release strategy that delivers new versions of an application in phases to minimize downtime. Anyone who has lived through a failed update knows how painful it can be. If a comprehensive update fails, there are hours of downtime while it is rolled back. Even if the deployment happens after hours, […]

Read more