Feature Flags vs Canaries

May 2, 2017 by Armory

Feature flags and canaries have different objectives despite appearing similar: Feature flags expose a specific feature to a sample set of users and then measures it’s impact. Canaries expose a specific version of the entire application to a sample set of users and then measures it’s impact.

So what’s the difference?

Here’s the high-level:

Feature Flags


Most companies use feature flags to understand the impact of a specific feature on user or business level metrics. For instance, how does this new auditing feature impact our user engagement, positively or negatively?

On the opposite end of the spectrum is canaries. When you execute a canary deployment, you’re seeking consistency and looking for metrics to remain within a standard deviation. When you detect an anomaly, you typically rollback the change. Additionally, you’re also looking at signals like system and network level metrics to determine whether the deployment was a success.

Canaries: Best Practices & Tools

In practice, canaries are most commonly used through manual processes: an instance is manually rolled forward to the new versions while a team watches and monitors multiple metrics dashboards for the instance’s health. If all goes well, the team may choose to deploy to a few more servers, then repeat the process. The downfall with this approach is that it is hard to determine which metrics matter and what to do in the event of failure.

During a deployment, hours are wasted agreeing that there is an issue and even more time deciding what to do about it. The journey to automating this process is further complicated by assumption that the underlying mechanism to deploy – and rollback – is reproducible, safe, and consistent. This is exactly why Netflix moved away from this style of deployment to an automated method. Their automated systems look at over 1,000 metrics in real-time when doing deployment of the Netflix API!

Consistent Deployments

The best way to achieve these goals is to use immutable infrastructure as it limits the risk of external resources not being available at run time; but this also creates confidence in the immutable package itself. Of course we believe that Spinnaker is the best to get an initial solid deployment. It becomes the basis of any deployment since you can make assumptions about deploying an AMI and having application configuration logic encapsulated from deployment logic.

Once you have a predictable deployment model, the next step is to instrument the applications and environments. In order to have an intelligent deployment you’ll need lots of information for the system to make good decisions. While unit tests and regression/integration tests provide a solid base for gaining confidence in the deployment there are many other tools that give you better ROI to gain additional confidence.

More focus on Instrumentation, Less on Integration Testing

When asked about what makes a deployment “safe,” the easy answer is to add more integration tests. While theoretically you can reach 100% coverage with integration & regression tests, the cost of doing so is very high. After a set of core integration tests, each additional test provides diminishing value but costs the same to create and maintain. In most cases teams find themselves maintaining an application —the integration suite— to test their applications because their integration tests become so complex. Maintaining integration tests becomes difficult over time and soon the tests are left behind as the application continues to progress which leaves the application with no coverage.

An alternative to this approach is to use canaries, but in most cases applications and systems don’t have enough instrumentation in place to signal an automated control system (or a human) that the deployment is heading for failure. Canary-ing is now a common deployment strategy within new PaaS frameworks like Kubernetes, however Kubernetes or any other system won’t know what a “good deployment” is as it’s very specific to your application and organization. This is why instrumentation is so critical to the success of your deployments.

Feature Flags: Best Practices & Tools

Creating feature flags is a popular method used by companies like Facebook to be able to deploy hundreds of features with a single version of Facebook but expose some features to a sample size of the population, measure it against a control group and make a product decision as to whether this feature should be exposed to the entire population.

Within companies like Facebook and Netflix there are entire teams who work on services to handle statistical significance, bias, control and results of features. While much of the intelligence on feature flags has been kept proprietary at both companies, Netflix has open sourced a tool called Archaius which allows for dynamic configuration of applications which can be used to expose features at run-time.

Archaius is analogous to Consul which allows for dynamic configuration. These tools aide in the ability to change the configuration they will not help with the need to analyze those changes and make the necessary decisions.

There are off-the-shelf tools which aid the ability to do feature flagging and measure their results: Tools like Optimizely and Launch Darkly allow for developers to hide or modify features to provide experiments to a small population for measuring the results. This then gives the product team information on whether to move forward with 100% exposure of the feature.

Optimizely is well suited for web-based features and experiments which allows anybody to easily create new experiments through the UI. It also handles the intelligence to calculate results through their dashboard.

LaunchDarkly is “Feature Flag As A Service”. It helps developers release features independently of deployments. As long as the feature is already deployed it can be hidden from users until the product team is ready to release the feature and measure its results. This tool is more likely to be integrated with your code at an API level but also provides an experiments framework.

The Right Tools for the Right Problem

Ultimately, feature flags and canary-ing techniques set out to solve two different problems. Both are part of a greater product and engineering strategy to achieve an amazing customer experience. Using each one where appropriate will allow your teams to deploy faster with greater levels of confidence. Here are some guidelines to contrast feature flags and canaries.

Learn More

Share this post:

Recently Published Posts

How to Become a Site Reliability Engineer (SRE)

Jun 6, 2023

A site reliability engineer (SRE) bridges the gap between IT operations and software development. They understand coding and the overall task of keeping the system operating.  The SRE role originated to give software developers input into how teams deploy and maintain software and to improve it to increase reliability and performance. Before SREs, the software […]

Read more

Continuous Deployment KPIs

May 31, 2023

Key SDLC Performance Metrics for Engineering Leaders Engineering leaders must have an effective system in place to measure their team’s performance and ensure that they are meeting their goals. One way to do this is by monitoring Continuous Deployment Key Performance Indicators (KPIs).  CD and Automated Tests If you’re not aware, Continuous Deployment, or CD, […]

Read more

What Are the Pros and Cons of Rolling Deployments?

May 26, 2023

Rolling deployments use a software release strategy that delivers new versions of an application in phases to minimize downtime. Anyone who has lived through a failed update knows how painful it can be. If a comprehensive update fails, there are hours of downtime while it is rolled back. Even if the deployment happens after hours, […]

Read more