Skip to main content

Automated Canary Analysis with Spinnaker and Kayenta: A Deep Dive

Leveraging modern cloud-native deployment strategies is an important part of scaling distributed applications. These strategies allow teams to test for danger and ensure that applications are truly ready to be deployed in production environments.

What is Canary Analysis?

One of these deployment strategies is called canary deployments and is a pattern for rolling out releases to a subset of users or servers, testing the changes, and then rolling the changes out to the rest of the servers. The name was influenced by the idea of the “canary in the coalmine”.Canary deployments serve as an early warning indicator with less impact on downtime:

Canaries were once regularly used in coal mining as an early warning system. Toxic gases such as carbon monoxidemethane or carbon dioxide in the mine would kill the bird before affecting the miners. Signs of distress from the bird indicated to the miners that conditions were unsafe. The use of miners’ canaries in British mines was phased out in 1987.

– Wikipedia

Canaries are usually run against deployments containing changes to code, but they can also be used for operational changes, including changes to the configuration.

Set Up Canary Analysis Support

Setting up automated canary analysis in Spinnaker consists of running a series of Halyard commands. Before you can use the canary analysis service, you must configure at least one metrics service, and at least one storage service.

The most common setup is to have one metrics service configured (e.g. Stackdriver, Atlas, Prometheus, Datadog or New Relic) and one storage service (e.g. S3, GCS or Minio) configured. For further details, here’s a comprehensive reference.

This set of sample Halyard commands will enable Kayenta and configure it to retrieve metrics from Stackdriver and use GCS for persistent storage:

hal config canary enable
hal config canary google enable
hal config canary google account add my-google-account \
  --project $PROJECT_ID \
  --json-path $JSON_PATH \
  --bucket $MY_SPINNAKER_BUCKET
hal config canary google edit --gcs-enabled true \
  --stackdriver-enabled true

In these commands:

$PROJECT_ID is your GCP project ID
$JSON_PATH points to your service account JSON file—don’t include quotes
$MY_SPINNAKER_BUCKET points to a GCS bucket that accepts your credentials.

These can be the same values you used when configuring your other Spinnaker services (like Clouddriver).
Note All canary-specific Halyard commands require Halyard version
0.46.0 or later.

sudo update-halyard

or

sudo apt-get update && sudo apt-get install halyard

Next, set the Spinnaker version to v1.7.0 or higher:

hal config version edit --version 1.7.0

Lastly, update your Spinnaker deployment to include Kayenta:
hal deploy apply (to Kubernetes) sudo hal deploy apply (to local VM)

Configuring a Canary

Before you can add a canary stage to a pipeline, you need to configure what the canary consists of, including:

  • A name by which a canary stage can choose this config
  • The specific metrics to evaluate, and a logical grouping of those metrics
  • Default scoring thresholds (which can be overridden in the canary stage)
  • Optionally, one or more filter templates

Canary configuration is done per Spinnaker application. For each application set up to support canary, you create one or more configs.

Prerequisites

By default, canary is not enabled for new applications. Several things need to happen before you see the Canary tab in Deck:

  • The person or people setting up Spinnaker for you must set up Canary.
  • In the Application config, activate the Canary option.

Do this separately for all applications that will use automated canary analysis.

Create a canary configuration

You can create as many of these as you like, and when you create a canary stage, you must select a canary configuration to use. Configurations you create within an application are available to all pipelines in that application, but your Spinnaker might be set up so that all configurations are available to all applications.

 

  1. Hover over the Delivery tab, and select Canary configs.

Select __Canary__ from the __Delivery__ menu.

  1. Select Add configuration.
  2. Provide a Name and Description. This is the name shown in the stage config when you create a canary stage for your pipeline.

Create metric groups and add metrics

The metrics available depend on the telemetry provider you use. Spinnaker currently supports Stackdriver, Prometheus, Datadog, Signalfx, and New Relic.

Metrics are evaluated even if they’re not added to groups, but if you want to apply the weighting that determines the relative importance of different metrics, you need to add them to groups.

  1. Create any groups you want to organize the metrics into. Click Add Group to create each group you’ll use. Then select the group and click the edit icon to name it.
  2. In the Metrics section, select Add Metric.
  3. Select the group to add this metric to.
  4. Give the metric a name.
  5. Specify whether this metric fails when the value deviates too high or too low compared to the baseline.
  6. Optionally, choose a filter template. Here’s an example:
resource.type = "gce_instance" AND
resource.labels.zone = starts_with("${zone}")
  1. Identify the specific metric you’re including in the analysis configuration:
    • In the Metric Type field type at least 3 characters to populate the field with available metrics.For example, if you type cpu you get a list of metrics available from your telemetry provider.

List of available metrics

Add a canary stage to a pipeline

Once you have enabled canary analysis for your application and have one or more configs prepared, you can now add a canary stage to your pipeline and configure it to perform canary analysis for your deployment.

This stage type is for the canary analysis only. The canary stage doesn’t perform any provisioning or cleanup operations for you. Those must be configured elsewhere in your pipeline.

 

Define the canary stage

  1. In the pipeline in which you will run the canary, click Add stage
    • This pipeline needs to be in an application that has access to the canary configuration you want to use.
  2. For Type select Canary.
  3. Give the stage a name, and use the Depends On field to position the stage downstream of its dependencies.

Canary stage declaration

  1. Select the Analysis Type—either Real Time or Retrospective.
    • Real Time: The analysis happens for a specified time period, beginning when the stage executes (or after a specified Delay).
    • Retrospective: Analysis occurs over some specified period. Typically, this is done for a time period in the past, against a baseline deployment and a canary deployment which have already been running before this canary analysis stage starts.
  2. Specify the analysis configuration
    • Choose the Config Name.
    • Set a Delay.
    • Set the Interval.
    • For Lookback Type, select Growing or Sliding.

Canary stage declaration

  1. Describe the metric scope.

Canary stage declaration

  1. Adjust the Scoring Thresholds,  if needed. 
  2. Specify the accounts you’re using for metrics and storage.
    • The Metrics Account points to the telemetry service provider account you configured here.
    • The Storage Account points to the GCS or S3 account you configured here.

The next section describes in detail what a Canary Judge is and how you can leverage it for Automated Canary Analysis. 

The Canary Judge

Most metric stores (DataDog, NewRelic, AppDynamics, etc.) have a way for users to set thresholds. When a threshold is exceeded, an alarm or event gets triggered. This alarm or event API can usually be queried. However, Graphite doesn’t natively support setting thresholds (although it supports it with third party plugins). Therefore, you need to set these thresholds in Spinnaker.

We created a Judge (StaticBaselineJudge-v1.0) that allows you to set a static baseline parameter. When running a Canary Analysis, Spinnaker takes the value of this parameter and uses it to compare against the canary data.

Configuring the Static Baseline Judge

On the Canary Configuration page when creating a new Config, you can select the StaticBaselineJudge-v1.0 or regular NetflixACAJudge-v1.0.

image

Select the Static Judge

To input the metric value needed, edit the config as JSON.

image

Set the following property:

"extendedProperties": {
  "staticBaseline": 300
}

image

You need to do this for each metric you want to compare against a Static Baseline. By default, if this property is not set, then the judge performs the same analysis that NetflixACAJudge-v1.0 does.

That means you can have multiple metrics in your Canary Config: ones that make use of the Static Baseline and others that use the regular Judge.

As an example of this, the following Canary Config has two metrics defined where one is setting the staticBaseline parameter and the other is not:

{
  "applications": [
    "training"
  ],
  "classifier": {
    "groupWeights": {
      "Group 1": 100
    }
  },
  "configVersion": "1",
  "createdTimestamp": 1569534009252,
  "createdTimestampIso": "2019-09-26T21:40:09.252Z",
  "description": "",
  "judge": {
    "judgeConfigurations": {},
    "name": "StaticBaselineJudge-v1.0"
  },
  "metrics": [
    {
      "analysisConfigurations": {
        "canary": {},
        "extendedProperties": {
          "staticBaseline": 300
        }
      },
      "groups": [
        "Group 1"
      ],
      "name": "canary",
      "query": {
        "customInlineTemplate": "PromQL:avg(container_spec_cpu_period{namespace=\"${location}\"})",
        "labelBindings": [],
        "metricName": "container_network_receive_bytes_total",
        "resourceType": "aws_ec2_instance",
        "serviceType": "prometheus",
        "type": "prometheus"
      },
      "scopeName": "default"
    },
    {
      "analysisConfigurations": {},
      "groups": [
        "Group 1"
      ],
      "name": "Regular Canary",
      "query": {
        "customInlineTemplate": "PromQL:avg(container_spec_cpu_period{namespace=\"${location}\"})",
        "serviceType": "prometheus",
        "type": "prometheus"
      },
      "scopeName": "default"
    }
  ],
  "name": "karlo-canario",
  "templates": {},
  "updatedTimestamp": 1572365396680,
  "updatedTimestampIso": "2019-10-29T16:09:56.680Z"
}

When running the above Canary Config on a Canary Stage, the value “300” gets used as the Baseline parameter for the analysis.

image

Using Automatic Canary Analysis with AWS CloudWatch

Now that we’ve discussed how to enable Kayenta and configure canary analysis, let’s try putting that knowledge to use by implementing something practical.

Cloudwatch configuration

Let’s explore how we might implement canary analysis using metrics from a public such as AWS Cloudwatch. To enable CloudWatch, update the AWS configuration entry in your kayenta-local.yml file. Make sure METRICS_STORE is listed under supportedTypes. Add the cloudwatch entry with enabled: true.

The example below uses S3 as the object store and CloudWatch as the metrics store.

kayenta:
  aws:
    enabled: true
    accounts:
      - name: monitoring
        bucket: <your-s3-bucket>
        region: <your-region>
        rootFolder: kayenta
        roleName: default
        supportedTypes:
          - OBJECT_STORE
          - CONFIGURATION_STORE
          - METRICS_STORE
  cloudwatch:
    enabled: true
  s3:
    enabled: true

Canary configs

In the UI, you need to create a new canary config for the metrics you are interested in.

image

Add your Cloudwatch MetricStat JSON in the Template field.

{
    "Metric": {
        "Namespace": "kayenta",
        "MetricName": "integration.test.cpu.value",
        "Dimensions": [
            {
                "Name": "scope",
                "Value": "myapp-prod-canary-2"
            },
            {
                "Name": "namespace",
                "Value": "prod-namespace-2"
            }
        ]
    },
    "Period": 300,
    "Stat": "Average",
    "Unit": "None"
}
image

Pipeline configs

In your canary stage, set up the canary config you just created. Then use the application values from CloudWatch to fill in the Baseline + Canary Pair and MetricScope fields.

image

Conclusion

Hopefully, this post has given you a deeper understanding of how Spinnaker leverages Kayenta for automated canary analysis. We’ve only scratched the surface of what you can achieve with Canary deployments in this post, so I encourage you to give the docs a look when you get a chance and play around with the feature a bit.