During World War I, canaries were used to detect toxic gases in the environment before humans could be harmed. They were also routinely used in coal mines where they would sing and chirp inside a small cage to signal the air was safe for the workers. Fortunately for the birds, they were later replaced with automated detectors.
While it's been a good few years since real canaries have been used as early warning systems, the concept has lived on in the software industry where it's known as an advanced automated testing technique available as part of the continuous deployment (CD) process.
Here's a brief introduction to what it is, how it works, and the way Netflix approaches it using Spinnaker.
What is Automated Canary Analysis (ACA)?
As explained in the eBook on Continuous Delivery With Spinnaker,
"A canary release is a technique to reduce the risk from deploying a new version of software into production. A new version of software, referred to as the canary, is deployed to a small subset of users alongside the stable running version. Traffic is split between these two versions such that a portion of incoming requests are diverted to the canary."
That's the gist of it. The point of this approach is to uncover if something is amiss and re-route traffic to the stable version before the majority of users are impacted.
Why use ACA?
The "problem" with continuous delivery is that teams need to have systems in place in order to facilitate fast, frequent, and safe software releases.
Like most development teams, Netflix had a manual process in place for canary analysis. Typically, this process involved a (bored) developer staring at graphs and combing through logs before judging whether to move forward or roll back.
Needless to say, this process was time-consuming and made it difficult to keep up with the short delivery cycles of CD. Not to mention the final judgement call wasn't always the right one.
With ACA, rolling out new software releases is more reliable since manual processes are replaced with computed analysis. This ensures only safe and stable deployments reach production.
How it works
In a nutshell, after the canary has been released the quality of the new software version is evaluated by comparing metrics describing the old version against the metrics describing the new version.
If there is a significant difference in the metrics (and not in a good way), the canary will be aborted and all traffic will be routed back to the old, stable version.
How Netflix uses automated canary release in Spinnaker
Spinnaker, the open source CD platform built by Netflix, features a Canary stage which is responsible for running one or more iterations of the ACA before making the decision to continue, roll back, or prompt manual judgement.
According to the eBook mentioned earlier, Netflix uses three different clusters to step up the vanilla canary release process.
The production cluster: This cluster holds the current software version and receives the majority of traffic.
The baseline cluster: This cluster runs a copy of the production version but receives less traffic.
The canary cluster: This cluster carries the new version and receives a small amount of traffic. This cluster is run against the baseline cluster (not production) to avoid unreliable results due to long-running processes.
Spinnaker handles the lifecycle of the baseline and canary clusters. If all is well when comparing the metrics describing the canary cluster and the baseline cluster, the new version will be fully rolled out into a new cluster. If not, Spinnaker will pull the canary and redirect all traffic to the production cluster.
Learn how to take advantage of ACA using Spinnaker
Clearly, this post is far from being an in-depth guide to using ACA in Spinnaker. If you want to know how to employ automated canary analysis techniques in a Spinnaker-managed deployment workflow, don’t miss the Automated Canary Analysis Workshop at Spinnaker Summit led by Software Engineers from Google and Netflix.
They'll be providing a complete introduction to ACA, including how to set up the Canary stage in Spinnaker. You'll also have the opportunity to ask questions specific to your organization so you can truly take advantage of this technique.
But be quick, the Summit has limited seats so register here to grab yours while they’re still available.