Leverage Advanced Deployment Strategies to Change How You Ship Software
Jun 4, 2018 by Ben Mappen
Traditionally, deploying changes to a company’s applications has been a difficult and scary affair, which has motivated companies to build monolithic apps that are updated as infrequently as possible, with each update being a complicated firedrill coordinated across many teams (and often done on Friday night, so teams can spend the weekend cleaning up the mess before Monday morning hits).
Worse still, when an application has failed globally in production due to an update, the human resposne is to halt the production line, add in more layers of manual QA and integration testing, and more manual judgements (aka human approvals) before deploying again. This slows companies down even further while increasing costs. And stifled innovation is the tremendous opportunity cost of this false feeling of security.
Imagine flipping this dynamic on its head. Maybe the answer isn’t to deploy less often. Maybe, instead, it’s to deploy more often. To use sophisticated deployment techniques to limit your blast radius when failures do happen, and automation to build quality into deployments vs. trying to clean up the mess in production.
Imagine a world where you could deploy software continuously to production, like running water. Where each deployment was a non-event because it happened dozens, hundreds or even thousands of times every day — instead of huge, scary deployment events once every other month. Imagine how productive — and happy — your application developers and product teams would be to get their code into the world immediately, and the fast feedback loops they could achieve with your customers.
Armory’s Platform, powered at the core by Spinnaker, enables such a world. It completely changes how a company thinks about shipping software.
Spinnaker’s Advanced Deployment Strategies
Here’s an in-depth post that describes Spinnaker’s advanced deployment techniques in more detail. These techniques sit at the core of re-imagining how your company gets its applications into production safely and with velocity.
Different tools advertise themselves as continuous delivery (CD) or deployment tools. However, some are more robust than others. Robust CD requires putting meticulous care into the deployment process, verification, and potential rollback. This means meeting requirements like zero downtime deployments, manual or automatic canary verification, handling stalled or failed deployments, and more. Not every tool meets these requirements.
Spinnaker is a delivery platform originally developed by Netflix to meet and exceed these requirements at scale. Netflix was one of the first companies to adopt cloud computing and push CD in the industry. Netflix designed Spinnaker’s resilient deployment process by implementing deployment strategies. The built-in deployment strategies support different use cases, all while promoting production stability. This post dives into the specifics about Spinnaker’s different deployment strategies and when to use them.
If you’d like to learn more about Spinnaker, we recommend reading this O’Reilly Book.
This deployment strategy is aptly named after the film Highlander because of the famous line, “there can be only one.” With this strategy, there is a load balancer fronting a single cluster. Highlander destroys the previous cluster after the deployment is completed. This is the simplest strategy, and it works well when rollback speed is unimportant or infrastructure costs need to be kept down.
This deployment strategy is also referred to as blue/green. The Red/Black strategy uses a load balancer and two target clusters (known as red/black or blue/green). The load balancer routes traffic to the active (enabled) cluster. Then, a new deployment replaces servers in the disabled cluster. When the disabled cluster is ready, the load balancer routes traffic to this cluster and the previous cluster becomes disabled. The now previously enabled cluster is kept around for the next deployment.
Red/black creates atomic deployments in the sense that all traffic goes from the previous version to the current version at the same time. This is useful for applications that can’t handle running multiple versions at once. Engineers must also consider pre-warming before traffic switches over. Given that all traffic switches at the same time, fresh application instances may be overwhelmed if they depend on full caches or other transient data.
This strategy is great for rollbacks since the previous running version is kept around as a “hot standby” but is not receiving active traffic. Rollbacks occur quickly by simply changing the enabled cluster behind the load balancer. This speed comes at a cost since the old infrastructure is kept around.
Rolling red/black is a slower red/black with more possible verification points. The process is the same as red/black, but difference is in how traffic switches over. The above image illustrates this difference. Blue is the enabled cluster. Blue instances are gradually replaced by new instances in the green cluster until all enabled instances are running the newest version. The rollout may occur in 20% increments, so it can be 80/20, 60/40, 40/60, 20/80, or 100%. Both blue/green clusters receive traffic until the rollout is complete.
This strategy is slow for deployments and rollbacks, especially when there are large numbers of instances involved. The different stages offer far more verification possibilities throughout the process. This approach is especially valuable when verification must happen every step of the way.
Monitoring of the production load is a common use case for Rolling Red/Black. Large-scale, global applications are often prevented from accurately predicting load impact until something goes into production. Therefore, it is critical that new deployments are load tested before rolling out to 100% of the infrastructure. The verification capability does come at a cost. The application must support two different versions running at once. This is a key design decision that must be handled at the application and database layers.
Canary deployments is a process in which a change is partially deployed, then tested against baseline metrics before continuing. This process reduces the risk that a change will cause problems when it has been completely rolled out by limiting your blast radius to a small percentage of your userbase. The baseline metrics are set when configuring the canary. Metrics may be error count or latency. Higher-than-baseline error counts or latency spikes kill the canary, and thus stop the pipeline.
Spinnaker canaries are stages. Progression through the canary stage can trigger deployments to subsequent stages using the above deployment strategies, or include even more canary stages. Spinnaker uses metrics in combination with a judge to determine if a canary passes or fails the stage. Spinnaker integrates with multiple monitoring systems to automate this process as much as possible. However, you’ll still need to configure which metrics to judge against.
Canaries are usually run against deployments containing changes to code, but they can also be used for operational changes, including changes to configuration.
The out-of-the-box strategies combined with canary stages should be enough for small, large, and enterprise teams. If these options do not work for you, then you can write your own. You can implement the shadow deployment strategy. This is useful when a new version must undergo production observation without impacting production traffic. This strategy forwards traffic to both versions without impacting users. However, it is complicated to set up and requires extra infrastructure. You may not need something like this, but the point is that you can easily implement custom strategies with Spinnaker.
Bear in mind that that these deployment strategies target deploying applications. Robust deployments are integral to a strong CD pipeline, but they do not end there.
Feature Flags: Deployment Strategies for Features
The deployment strategies mentioned above ensure that all servers update correctly and that the correct versions are running—but that is only half the battle. The second half comes in when launching features. Feature flags are like deployment strategies for functionality.
Here is a familiar scenario throughout IT: a team tested new features thoroughly, stakeholders verified everything is available in a staging environment, and plenty of QA staff beat the new code to death. Everything is good to go. The build is promoted to production. Then (as always), for an unexpected reason, everything goes sideways. Now, a rollback is required to quiet down the pagers. How could this situation be avoided?
Feature flags mitigate this scenario. Oftentimes, teams are caught off-guard by production behavior because there is no way to reproduce it in its entirety. New features alter existing performance characteristics or introduce new bottlenecks.
Feature flags may be used in different ways to hedge different risks. High impact features can be enabled for company staff in production for testing before flipping on for end users. Feature flags may work like the rolling red/black strategy, too. New features can flipped on for increasing percentages of users while production is monitored for possible problems. If anything goes wrong, flip the flags off and start again.
Feature flags do come at a cost. They effectively create infinite branches of the application. Consider an application with only two feature flags. This creates at least three versions of the application: one with both enabled, one with feature A, and one with feature B. The fourth combination is one with neither enabled, but that may not be useful. All versions must undergo integration testing with an eye for edge cases for when flags enable or disable features.
Interested in Feature Flagging? We recommend contacting LaunchDarkly to learn more.
Although feature flagging can feel similar to canary deployments, they are very different. Canaries are an engineering activity, and feature flags are a product activity. Companies can always be deploying, continuously, using canaries to limit the blast radius of changes in production, and product teams can use feature flags to decide which users should actually experience these changes. Here’s a blog post that details the differences in more detail.
This post examined how Spinnaker’s deployment strategies fit into CD pipelines, and how they change the way your company ships software. The bundled strategies we reviewed in this post should cover use cases for a vast majority of teams. Highlander is fast, simple, and cost effective. Red/black (also called blue/green) requires a trade-off of slower deployments for faster rollbacks. Rolling red/black slowly rolls out changes to an increasing number of servers and provides more verification points to continue or abort the deployment. Canaries add extra safety to CD pipelines to automatically identify issues and limit blast radius of faulty code and/or configuration.
Pipelines do not stop there. Simply because new code is deployed to production does not mean that new function is available. Feature flags provide product teams with Spinnaker-esque functionality for releasing features. While they are not provided by Spinnaker, they are a powerful ally in achieving CD.
Regardless of which strategy you choose, Spinnaker-backed deployment pipelines put your business at best-in-class in level, and Armory’s platform leverages Spinnaker to help your softare teams ship better software, faster.