Using Your Existing Automation In Your Deployment Pipeline

May 2, 2022 by Stephen Atwell

When you adopt a new deployment pipeline, it’s critical that you can easily leverage your existing test processes. Doing continuous deployment to production isn’t as simple as just deploying code. You need to deploy to test environments along the way to validate the code is working properly. You only want to deploy to production if these tests pass. During deployment to production, you also want to use health checks to ensure that your application is working as expected. This blog explores how Armory’s design partners leverage existing automation while deploying using our new Continuous Deployment-as-a-Service offering, Project Borealis.

Project Borealis currently supports two classes of automated validation during its deployment pipeline, automated canary analysis and webhook based approvals.

Automated canary analysis allows you to reuse your existing monitoring queries, to ensure that your metrics remain healthy. Webhook based approvals can invoke rest APIs, allowing you to reuse your existing automated tests.

Validation in Staging

Our design partners are doing verification within AfterDeployment Constraints. These tests trigger after a staging environment has been successfully deployed.  If all test pass, then the deployment will move on and deploy to production. If any test fails it cancels the production deployment. Let’s discuss some of the existing tools our design partners are leveraging for this validation.

Integration and Smoke Tests

Some companies only automate CI through unit tests because deployed environments introduce extra complexity. However, most companies have some integration tests even if they are only run manually. Typically these test end-to-end use-cases across multiple deployed services. These tests are frequently run within a company’s existing CI tooling, and either pass or fail. Our design partners are replacing manual triggers of these tests by triggering them from AfterDeployment constraints. This automatically ensures they pass before any production deployment.

Metric Tests

Automated canary analysis can run in your staging environment to check whether it remained healthy while your integration tests ran. If your staging environment has behavior during testing that would trigger an alert in production, you can abort deployment before any users are affected. If all the metrics stay within their typical thresholds, then you can safely deploy to production.

Security Scanners

Many automated tests that scan applications for common vulnerabilities exist. These fall into 2 categories, static code analysis (e.g. SonarQube, Coverity, etc)  and run-time scanners (e.g.  Arachni, or golismero, XSSPY, etc.). CI systems typically run static code analysis prior to compiling code. However, run-time scanning in a deployed environment is the only way to detect some issues. It’s not uncommon to run runtime scanners on a schedule, instead of before any code push to production, since you must deploy the code to run the scan. Project Borealis can leverage these scanners to ensure that every image passes the scan in a staging environment before it is ever deployed to production.

The DevOps community has also been adopting frameworks (e.g. Sigstore) for signing software during release, such that those signatures can be verified in production to ensure no image is deployed that didn’t pass tests even if it bypasses the standard deployment pipeline. Such signatures can be attached after validation in staging. A signature states which test was run, and that it passed. This signature can then be verified either by an admission controller within your cluster, or as a before deployment constraint. While these signatures can be verified in a before deployment constraint, admissions controllers have an advantage for signature verification since they verify the signature using the image as downloaded by the cluster, whereas a before deployment constraint would only be verifying it as downloaded by an automated test.

Before Deployment Requirements

Armory Deployments in Project Borealis

Our design partners have existing automation they need to trigger before certain environments start their deployment. Here we will cover the reasons why.

Database Upgrades

Some of our design partners are deploying stateful applications, and others are deploying stateless applications. Some of the stateful applications have existing scripts that they need to run to upgrade the database schema. If the database upgrade fails, it cancels the deployment of the environment. Otherwise, as soon as the upgrade completes, the environment starts deploying the new application version.

Existing Approval Workflows

Some of our design partners have complicated, cross team approval workflows that they have already automated in external systems. For example, one of them tracks a multi-level change review process in Jira that must complete before a change reaches production. These teams are leveraging webhooks from within before deployment constraints to trigger their existing automation that ensures the needed approvals have occurred. The deployment waits for the Jira process to approve or reject the change. If Jira approves the change then the deployment continues. However, if the process rejects the change it automatically cancels the deployment.

During a Canary Deployment

Leveraging Webhooks in a canary strategy

Canary Deployments allow customers to iteratively increase how much traffic reaches the new version of the software, while running tests in between each increase of traffic. Most deployment pipelines that support canary deployments can do an automated canary analysis–query a metric provider and decide whether to continue deployment based off whether the metric is healthy. Typically this is done using the customer’s existing metric queries from their existing observability solution.

In addition to leveraging automated metric queries, we also have design partners who are using webhooks to run their existing smoke tests on their deployed version, or to check the application logs for errors. These tests can check additional elements of application health beyond simple metrics, and ensure the application is operating as expected.

During canary deployment, the application can be automatically rolled back if any of the automated checks fail. Assume all automated checks pass, the deployment increases traffic again, and reruns the tests. Once traffic reaches 100% the deployment has completed successfully.

During a Blue/Green Deployment

Leveraging automated analysis and webhooks in a blue green strategy

A Blue/Green deployment starts by deploying the new version of your application in a mode where it receives no production traffic. It is often exposed to a preview url for either manual or automated validation. After deployment, but before redirecting production traffic, design partners use webhooks to run their existing test suites, and also leverage automated canary analysis to ensure the new version’s metrics are healthy.

If the new version is healthy, a blue/green deployment redirects production traffic to this new version. After redirect, the old version is kept around for awhile to provide instantaneous rollbacks if a new issue is uncovered. During this period, our design partners leverage canary analysis to ensure that the metrics remain healthy. They also leverage webhooks to trigger their existing automation to run smoke tests, and check logs for errors. If a test detects an issue it triggers a rollback, redirecting traffic back to the old version. Assuming everything passes, the old version is shut down, and the deployment is complete.

How do webhooks work in Project Borealis?

In Project Borealis, a user centrally configures a template of the webhook to invoke. This template, for example, will invoke a GitHub Action.


- name: UpgradeDatabaseSchema
method: POST
uriTemplate: https://api.github.com/repos/myorg/myrepo/dispatches
networkMode: direct
headers:
- key: Authorization
value: token {{secrets.github_token}}
- key: Content-Type
value: application/json
bodyTemplate:
inline: >-
{
"event_type": "eventToTrigger",
"client_payload": {
"callbackUri": "{{armory.callbackUri}}/callback",
"environment": "{{context.environment}}"
}
}
retryCount: 3

Webhook Lifecycle in Project Borealis
Webhook Lifecycle in Project Borealis
When invoking a webhook, Armory provides a callback URL that the service receiving the webhook must call once it’s test completes. This callback mechanism provides increased reliability when running long-running tests since it does not need to maintain a continuous network connection while the test runs. When invoking the callback, the test can state whether it succeeded or failed. The deployment continues if successful, and rolls back on failure. The webhook can also provide a message to the user explaining why a particular test failed.
Within the deployment, once a customer has defined a named webhook, they can reuse it across multiple environments and strategies. References can state it should run and optionally, pass any needed context for the invoked API.


constraints:
beforeDeployment:
- runWebhook:
name: UpgradeDatabaseSchema
context:
environment: production

Some webhooks are internet accessible but some are only reachable within the customers infrastructure. Webhooks can optionally connect through Armory’s Remote Network Agent, which allows triggering APIs that are not internet accessible, for example that of an internal Jenkins server.

How does Automated Canary Analysis work in Project Borealis?

Project Borealis can natively run queries against DataDog, New Relic, or Prometheus. You can copy these queries from your existing monitoring since they are in the metric-providers query language. Armory adds several fields such as the name of the deployed replicaset. You use these fields to filter queries to the specific application version being deployed. Just like with webhooks, the queries are reusable across your deployment. Here is an example definition of a prometheus query for the average CPU time:


queries:
- name: cpuTime
upperLimit: 10000#3
lowerLimit: 0
queryTemplate: >-
avg (avg_over_time(container_cpu_system_seconds_total{job="kubelet"}[{{armory.promQlStepInterval}}]) * on (pod) group_left (annotation_app)
sum(kube_pod_annotations{job="kube-state-metrics",annotation_deploy_armory_io_replica_set_name="{{armory.replicaSetName}}"})
by (annotation_app, pod)) by (annotation_app)

As with webhooks, your deployment can reference these queries from multiple places for easy reuse:


- analysis:
interval: 7
units: seconds
numberOfJudgmentRuns: 3
queries:
- cpuTime
- memoryUsage

Conclusion

We’ve covered how to leverage your existing automation within your Continuous Deployment pipeline to accelerate the adoption of CD. We’ve shown how our design partners leverage this automation in Armory’s new Continuous Deployment-as-a-Service offering, Project Borealis. While this offering is rapidly approaching GA, we are still accepting new design partners, so if you’d like to use it to deploy your application let us know.

Recently Published Posts

Reliable and Automatic Multi-Target Deployments

May 16, 2022

Multi-target deployments can feel tedious as you deploy the same code over and over to multiple clouds and environments — and none of them in the same way. With an automatic multi-target deployment tool, on the other hand, you do the work once and deliver your code everywhere it needs to be. Armory provides an […]

Read more

Learning out Loud: KubeCon EU edition

May 11, 2022

KubeCon+CloudNativeCon EU is one of the world’s largest tech conferences. Here, users, developers, and companies who have and intend to adopt the Cloud Native standard of running applications with Kubernetes in their organizations come together for 5 days. From May 16-20, 2022, tech enthusiasts will congregate both virtually and in person in Valencia, Spain to […]

Read more

Long-term Support (LTS) Releases

May 9, 2022

Deciding how frequently to release a product is an interesting challenge faced by many companies. There are definite pros and cons related to adjusting your release cadence that have to be evaluated on an individual basis. Faster release cycles in theory might sound good, but of course, there can be tradeoffs. Looking at historical release […]

Read more