Feature Flags vs Canaries

May 2, 2017 by Armory

Feature flags and canaries have different objectives despite appearing similar: Feature flags expose a specific feature to a sample set of users and then measures it’s impact. Canaries expose a specific version of the entire application to a sample set of users and then measures it’s impact.

So what’s the difference?

Here’s the high-level:

Feature Flags


Most companies use feature flags to understand the impact of a specific feature on user or business level metrics. For instance, how does this new auditing feature impact our user engagement, positively or negatively?

On the opposite end of the spectrum is canaries. When you execute a canary deployment, you’re seeking consistency and looking for metrics to remain within a standard deviation. When you detect an anomaly, you typically rollback the change. Additionally, you’re also looking at signals like system and network level metrics to determine whether the deployment was a success.

Canaries: Best Practices & Tools

In practice, canaries are most commonly used through manual processes: an instance is manually rolled forward to the new versions while a team watches and monitors multiple metrics dashboards for the instance’s health. If all goes well, the team may choose to deploy to a few more servers, then repeat the process. The downfall with this approach is that it is hard to determine which metrics matter and what to do in the event of failure.

During a deployment, hours are wasted agreeing that there is an issue and even more time deciding what to do about it. The journey to automating this process is further complicated by assumption that the underlying mechanism to deploy – and rollback – is reproducible, safe, and consistent. This is exactly why Netflix moved away from this style of deployment to an automated method. Their automated systems look at over 1,000 metrics in real-time when doing deployment of the Netflix API!

Consistent Deployments

The best way to achieve these goals is to use immutable infrastructure as it limits the risk of external resources not being available at run time; but this also creates confidence in the immutable package itself. Of course we believe that Spinnaker is the best to get an initial solid deployment. It becomes the basis of any deployment since you can make assumptions about deploying an AMI and having application configuration logic encapsulated from deployment logic.

Once you have a predictable deployment model, the next step is to instrument the applications and environments. In order to have an intelligent deployment you’ll need lots of information for the system to make good decisions. While unit tests and regression/integration tests provide a solid base for gaining confidence in the deployment there are many other tools that give you better ROI to gain additional confidence.

More focus on Instrumentation, Less on Integration Testing

When asked about what makes a deployment “safe,” the easy answer is to add more integration tests. While theoretically you can reach 100% coverage with integration & regression tests, the cost of doing so is very high. After a set of core integration tests, each additional test provides diminishing value but costs the same to create and maintain. In most cases teams find themselves maintaining an application —the integration suite— to test their applications because their integration tests become so complex. Maintaining integration tests becomes difficult over time and soon the tests are left behind as the application continues to progress which leaves the application with no coverage.

An alternative to this approach is to use canaries, but in most cases applications and systems don’t have enough instrumentation in place to signal an automated control system (or a human) that the deployment is heading for failure. Canary-ing is now a common deployment strategy within new PaaS frameworks like Kubernetes, however Kubernetes or any other system won’t know what a “good deployment” is as it’s very specific to your application and organization. This is why instrumentation is so critical to the success of your deployments.

Feature Flags: Best Practices & Tools

Creating feature flags is a popular method used by companies like Facebook to be able to deploy hundreds of features with a single version of Facebook but expose some features to a sample size of the population, measure it against a control group and make a product decision as to whether this feature should be exposed to the entire population.

Within companies like Facebook and Netflix there are entire teams who work on services to handle statistical significance, bias, control and results of features. While much of the intelligence on feature flags has been kept proprietary at both companies, Netflix has open sourced a tool called Archaius which allows for dynamic configuration of applications which can be used to expose features at run-time.

Archaius is analogous to Consul which allows for dynamic configuration. These tools aide in the ability to change the configuration they will not help with the need to analyze those changes and make the necessary decisions.

There are off-the-shelf tools which aid the ability to do feature flagging and measure their results: Tools like Optimizely and Launch Darkly allow for developers to hide or modify features to provide experiments to a small population for measuring the results. This then gives the product team information on whether to move forward with 100% exposure of the feature.

Optimizely is well suited for web-based features and experiments which allows anybody to easily create new experiments through the UI. It also handles the intelligence to calculate results through their dashboard.

LaunchDarkly is “Feature Flag As A Service”. It helps developers release features independently of deployments. As long as the feature is already deployed it can be hidden from users until the product team is ready to release the feature and measure its results. This tool is more likely to be integrated with your code at an API level but also provides an experiments framework.

The Right Tools for the Right Problem

Ultimately, feature flags and canary-ing techniques set out to solve two different problems. Both are part of a greater product and engineering strategy to achieve an amazing customer experience. Using each one where appropriate will allow your teams to deploy faster with greater levels of confidence. Here are some guidelines to contrast feature flags and canaries.

Learn More

Share this post:

Recently Published Posts

Continuous Deployments meet Continuous Communication

Sep 7, 2023

Automation and the SDLC Automating the software development life cycle has been one of the highest priorities for teams since development became a profession. We know that automation can cut down on burnout and increase efficiency, giving back time to ourselves and our teams to dig in and bust out innovative ideas. If it’s not […]

Read more

Happy 7th Birthday, Armory!

Aug 21, 2023

Happy 7th birthday, Armory! Today we’re celebrating Armory’s 7th birthday. The parenting/startups analogy is somewhat overused but timely as many families (at least in the US) are sending their kids back to school this week. They say that parenting doesn’t get easier with age – the challenges simply change as children grow, undoubtedly true for […]

Read more

Visit the New Armory Developer Portal

Aug 11, 2023

Easier Access to Tutorials, Release Notes, Documentation, and More! Developer Experience (DX) is one of Armory’s top focuses for 2023. In addition to improving developer experience through Continuous Deployment, we’re also working hard to improve DX for all of our solutions.  According to ThoughtWorks, poor information management and dissemination accounts for a large percentage of […]

Read more