Reduce the Blast Radius of a Bad Deployment with Automated Canary Analysis
May 23, 2022 by Stephen Atwell
Software deployment processes differ across organizations, teams, and applications. The most basic, and perhaps the riskiest, is the “big bang deployment.” This strategy updates all nodes within the target environment simultaneously with the new software version.
This deployment strategy causes many issues, including potential downtime or other issues while the update is in progress. It can also be challenging to perform a rollback when problems arise in production.
A big bang deployment can have devastating effects. For example, it can raise issues in production or even take down the system. Since rollbacks aren’t easy, production issues can cause a prolonged, catastrophic outage.
Overall, a big bang deployment throttles the delivery pipeline, causing deployments, bug fixes, and rollbacks to take too long and be too complex.

Although bad deployments cause challenges, automated canary analysis helps. Let’s explore how to reduce the blast radius of a bad deployment using Armory — based on open source Spinnaker — and Kayenta for automated canary analysis.
Safe and Reliable Deployments with Armory
Releasing lousy software can have a long-lasting, damaging impact on any business. Armory helps you overcome this by empowering developers with tools for automated, intelligent software delivery. Intelligent software delivery prevents large-scale outages and minimizes the blast radius of bad deployments.
DevOps teams can define Armory pipelines using templates to encourage reuse and continuous delivery (CD). These pipelines enforce best practices from Netflix, Google, and the open-source software (OSS) community. They also reduce the risk of bad deployments and large-scale outages.
Armory minimizes the impact of bad deployments using a 1-click rollback when a new deployment causes errors. Our platform also uses the automated canary analysis technique to test new releases before deploying them to production.
With this technique, the new changes deploy to a small set of users before the full release. The software automatically promotes or fails the deployment based on predefined metrics.
What is Canary Deployment?
The idea of canary deployment is based on canaries in coal mines. Miners took these birds into the mine to measure the amount of toxic gas present. Canaries are more sensitive to dangerous gases than humans, so miners would know hazardous gases were likely present if a canary died inside the mine.
Software deployment uses a version of this strategy. Instead of birds, the canary is the new software version. DevOps teams roll out a new application version in stages, deploying it to a small subset of the production infrastructure and enabling the latest version for a small set of users who help identify issues.
When ready, DevOps teams deploy the software version to a larger subset of the infrastructure and a larger set of users, and so on, until the rollout is complete. This strategy dramatically reduces the risk associated with deploying a new software version into production.

Manual Versus Automated Canary Analysis
When a DevOps team manually performs this deployment strategy, they deploy a new version of the application to a small subset of production servers and shift a small percentage of traffic to the latest version. The rest of the servers and traffic remain unchanged.
The team then looks at graphs and logs and monitors multiple metrics to determine the server health with the new version. If the results are within acceptable values, then the team deploys to a larger set of servers and traffic and repeats the process. Otherwise, they roll back the deployment and route all the traffic to the stable servers.
As we can see, manual canary analysis is tedious and time-consuming. Moreover, it doesn’t scale well when deploying to multiple servers several times a week or day and is prone to human error.
Automated canary analysis overcomes this by, well, automation. Automation makes fetching metrics and running statistical tests more accurate and less time-consuming.
Automated Canary Analysis using Armory Continuous Deployment-as-a-Service
In addition to Spinnaker, Armory has a new lightweight SaaS offering that supports canary deployments to Kubernetes that leverage automated canary analysis. This offering allows you to easily trigger a canary deployment from your existing deployment tooling, providing a CLI that is a drop-in replacement for kubectl.
CD-as-a-Service allows you to specify a set of queries for your metric provider, and thresholds for each query. If a metric comes back outside of the threshold range, the analysis is considered a failure. This makes it simple to, for example, require that CPU usage is below X% and memory usage is below Y% during an analysis. CD-as-a-Service provides a prescriptive canary strategy that can deploy pods to the new version one at a time, and check all metrics over a given amount of time before continuing to deploy to additional pods. This allows you to check the health of each individual pod as each one is provisioned, and trigger a rollback if any problems are detected.
In addition to automated canary analysis, Continuous Deployment-as-a-Service also supports running your existing custom automation during a canary deployment strategy. You can use this, for example, to issue a query against elastic search to check your logs for errors. If errors occur, have CD-as-a-Service rollback the change, if there are no errors finish scaling up traffic.
Armory is still seeking additional design partners to help shape project CD-as-a-Service, but spots are limited and filling up fast, so sign up today.
Automated Canary Analysis with Spinnaker and Kayenta
The automated canary analysis (ACA) platform Kayenta integrates with Armory, which is based on the open-source multi-cloud continuous delivery platform Spinnaker. You can set up an automated canary analysis stage in a Spinnaker pipeline and use Kayenta to assess the canary’s risk by fetching user data, running statistical tests, running checks for degradation between the new and old version, and providing an aggregate score for the canary.
Based on the score, Kayenta automatically judges whether to promote or fail the canary or prompt human intervention. It does this in two phases: metric collection and judgment.
Armory implements canary analysis by running three clusters in parallel: production, canary, and baseline. Comparing the production cluster with the canary deployment wouldn’t produce reliable results because of long-running process effects. For this reason, Armory creates a baseline cluster where it deploys the application’s production version.
The platform then compares the canary cluster against the baseline cluster. The canary and baseline clusters each receive a small percentage of the traffic, while the rest goes to the production cluster. Armory then handles the lifecycle of the canary and baseline clusters.

Metric Collection (Retrieval)
In Armory’s canary pipeline stage, DevOps teams can specify the metrics to check and their sources. Armory supports Stackdriver, Prometheus, Datadog, SignalFx, and New Relic. When Armory uses different sources, it combines the various metrics into a single analysis. However, it’s considered best practice to avoid adding too many metrics in one group.
Armory retrieves these metrics from the baseline and canary clusters, tags them (baseline or canary), and stores them in a time-series database. It then passes the results to the canary judge for analysis.
The judgment stage compares the baseline and canary results from the collection stage, individually evaluates each metric, and performs statistical tests. The output is an aggregate score ranging from 0 to 100. This score falls into three categories:
- Success: The judge promotes the canary to production.
- Marginal: The judge needs human intervention to make a decision.
- Failure: The judge recommends stopping the pipeline, performing a rollback, and directing traffic to production.
The judgment stage has four main steps:
- Data validation: This ensures there is valid data for the baseline and canary metrics before analysis. If the required data is not available for either the baseline or the canary or both, Armory labels the metric NODATA and moves the analysis to the next metric.
- Data cleaning: This prepares the raw data for comparison. This preparation includes handling and sanitizing missing values and removing outliers.
- Metric comparison: This compares the canary and the baseline for each metric. It classifies each metric as pass, high, or low, indicating the difference between the canary and the baseline.
- Score computation: This computes the final score based on the metric classifications. This score is a ratio of the number of pass metrics over all the metrics and is a percentage.
The calculation is for the final score is:
(pass metrics / total number of metrics) * 100

Next Steps
Contact Armory today for a complimentary assessment of your software delivery practices and learn more about how your organization can benefit from safe, reliable deployments.