May 2, 2017 by Isaac Mosquera
Feature flags and canaries have different objectives despite appearing similar: Feature flags expose a specific feature to a sample set of users and then measures it’s impact. Canaries expose a specific version of the entire application to a sample set of users and then measures it’s impact.
Here’s the high-level:
Most companies use feature flags to understand the impact of a specific feature on user or business level metrics. For instance, how does this new auditing feature impact our user engagement, positively or negatively?
On the opposite end of the spectrum is canaries. When you execute a canary deployment, you’re seeking consistency and looking for metrics to remain within a standard deviation. When you detect an anomaly, you typically rollback the change. Additionally, you’re also looking at signals like system and network level metrics to determine whether the deployment was a success.
In practice, canaries are most commonly used through manual processes: an instance is manually rolled forward to the new versions while a team watches and monitors multiple metrics dashboards for the instance’s health. If all goes well, the team may choose to deploy to a few more servers, then repeat the process. The downfall with this approach is that it is hard to determine which metrics matter and what to do in the event of failure.
During a deployment, hours are wasted agreeing that there is an issue and even more time deciding what to do about it. The journey to automating this process is further complicated by assumption that the underlying mechanism to deploy – and rollback – is reproducible, safe, and consistent. This is exactly why Netflix moved away from this style of deployment to an automated method. Their automated systems look at over 1,000 metrics in real-time when doing deployment of the Netflix API!
The best way to achieve these goals is to use immutable infrastructure as it limits the risk of external resources not being available at run time; but this also creates confidence in the immutable package itself. Of course we believe that Spinnaker is the best to get an initial solid deployment. It becomes the basis of any deployment since you can make assumptions about deploying an AMI and having application configuration logic encapsulated from deployment logic.
Once you have a predictable deployment model, the next step is to instrument the applications and environments. In order to have an intelligent deployment you’ll need lots of information for the system to make good decisions. While unit tests and regression/integration tests provide a solid base for gaining confidence in the deployment there are many other tools that give you better ROI to gain additional confidence.
When asked about what makes a deployment “safe,” the easy answer is to add more integration tests. While theoretically you can reach 100% coverage with integration & regression tests, the cost of doing so is very high. After a set of core integration tests, each additional test provides diminishing value but costs the same to create and maintain. In most cases teams find themselves maintaining an application —the integration suite— to test their applications because their integration tests become so complex. Maintaining integration tests becomes difficult over time and soon the tests are left behind as the application continues to progress which leaves the application with no coverage.
An alternative to this approach is to use canaries, but in most cases applications and systems don’t have enough instrumentation in place to signal an automated control system (or a human) that the deployment is heading for failure. Canary-ing is now a common deployment strategy within new PaaS frameworks like Kubernetes, however Kubernetes or any other system won’t know what a “good deployment” is as it’s very specific to your application and organization. This is why instrumentation is so critical to the success of your deployments.
Creating feature flags is a popular method used by companies like Facebook to be able to deploy hundreds of features with a single version of Facebook but expose some features to a sample size of the population, measure it against a control group and make a product decision as to whether this feature should be exposed to the entire population.
Within companies like Facebook and Netflix there are entire teams who work on services to handle statistical significance, bias, control and results of features. While much of the intelligence on feature flags has been kept proprietary at both companies, Netflix has open sourced a tool called Archaius which allows for dynamic configuration of applications which can be used to expose features at run-time.
Archaius is analogous to Consul which allows for dynamic configuration. These tools aide in the ability to change the configuration they will not help with the need to analyze those changes and make the necessary decisions.
There are off-the-shelf tools which aid the ability to do feature flagging and measure their results: Tools like Optimizely and Launch Darkly allow for developers to hide or modify features to provide experiments to a small population for measuring the results. This then gives the product team information on whether to move forward with 100% exposure of the feature.
Optimizely is well suited for web-based features and experiments which allows anybody to easily create new experiments through the UI. It also handles the intelligence to calculate results through their dashboard.
LaunchDarkly is “Feature Flag As A Service”. It helps developers release features independently of deployments. As long as the feature is already deployed it can be hidden from users until the product team is ready to release the feature and measure its results. This tool is more likely to be integrated with your code at an API level but also provides an experiments framework.
Ultimately, feature flags and canary-ing techniques set out to solve two different problems. Both are part of a greater product and engineering strategy to achieve an amazing customer experience. Using each one where appropriate will allow your teams to deploy faster with greater levels of confidence. Here are some guidelines to contrast feature flags and canaries.
Software deployment processes differ across organizations, teams, and applications. The most basic, and perhaps the riskiest, is the “big bang deployment.” This strategy updates all nodes within the target environment simultaneously with the new software version. This deployment strategy causes many issues, including potential downtime or other issues while the update is in progress. It […]
Read more →
Multi-target deployments can feel tedious as you deploy the same code over and over to multiple clouds and environments — and none of them in the same way. With an automatic multi-target deployment tool, on the other hand, you do the work once and deliver your code everywhere it needs to be. Armory provides an […]
Read more →
KubeCon+CloudNativeCon EU is one of the world’s largest tech conferences. Here, users, developers, and companies who have and intend to adopt the Cloud Native standard of running applications with Kubernetes in their organizations come together for 5 days. From May 16-20, 2022, tech enthusiasts will congregate both virtually and in person in Valencia, Spain to […]
Read more →