Announcing the New Spinnaker Observability Plugin
Dec 16, 2020 by Stu Posluns
The Armory team is excited to announce that the Spinnaker Observability plugin, released to the open source community in June, is now GA!
The Observability plugin, a great example of the growing Spinnaker plugin ecosystem, provides critical visibility to Spinnaker operators and replaces the Spinnaker Monitoring Daemon, which is deprecated as of OSS 1.20.
So, what exactly does the Observability plugin do, and why is it valuable to me as a Spinnaker operator?
A little history and some definitions…
Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. As Charity Majors succinctly put it in a great deep dive on the topic, “this means that one can determine the behavior of the entire system from the system’s outputs. If a system is not observable, this means that the current values of some of its state variables cannot be determined through output sensors.”
Back in the day, the concept of observability could be summed up as “are temperature alarms going off in the server room?” or, even more simply, “is the server melting down?”
Thankfully, there has been a lot of progress since then in observing and monitoring your infrastructure. Commensurate with that progress, however, expectations have risen. Service availability at the highest level simply isn’t a “nice to have” any more.
Spinnaker was developed in part to bring that high degree of fidelity and availability to applications in production. The Spinnaker platform is powerful, underpinning large-scale multi-cloud deployments at many of the world’s largest enterprises. Under the hood, this functionality and sophistication is powered by 11 different microservices. Observing the behavior of all of these different microservices, and the platform as a whole, is critical in keeping Spinnaker itself in a healthy, optimal state.
Spinnaker is quite verbose in terms of metrics, which in many ways is great for observability into the system. Depending on your particular setup, Spinnaker could easily be producing 70k unique metrics per minute. Looking at real-time metrics streams, historical logs, and even tracing (with the help of third-party APM vendors) can provide very granular observability into Spinnaker. Anything that you want to observe, from the frequency of your pipeline deployments to what might be causing latency in the Clouddriver service, can be surfaced. The only downside of all of this data is that you have to know exactly what you are looking for and where it lives in order to make it valuable.
Up until recently, the best window for observability into Spinnaker was the Spinnaker Monitoring daemon. The daemon uses a Spectator endpoint to scrape the metrics, process them, and feed them to other tools.
So, why did we build the Observability plugin?
With all of the data you would ever want available, and with a monitoring service already in place, why did we build the Observability plugin?
A few reasons. The Monitoring service was difficult to use, expensive to run, and did not scale well. As a result it was not widely used by the community, and there was little support to continue to maintain it. As of OSS 1.20, the Monitoring service was considered abandonware.
This provided Armory with the opportunity to reintroduce observability into Spinnaker with a clean slate. We created a simpler, more flexible, more scalable implementation that will be easier to use and well maintained by Armory and the rest of the community. In fact, we already heavily rely on the Observability plugin to help us maintain the Spinnaker instances of our Managed customers.
Side note: This is a great opportunity to give a much-deserved shoutout to Karl Skewes and the rest of the Uneeq team for their amazing contributions to the Observability plugin, particularly in making major updates to the dashboards.
Why should I implement the Observability plugin?
The new Observability plugin brings a number of benefits to Spinnaker operators.
If you weren’t using the Monitoring daemon before, then the benefit of the plugin is pretty simple: gain the observability you need to keep Spinnaker highly performant and available.
When transitioning from the Monitoring daemon, there are a number of important benefits:
- More focus: We’ve enabled a set of filters to allow you to cull the metrics we don’t think are particularly relevant, so that you can focus on the most important set of metrics. We will be doing a lot more here in future releases, which should reduce cardinality issues and lower the costs of tracking metrics data.
- Cost savings: On the topic of lowering costs, the Monitoring daemon is highly resource-intensive in terms of CPU and memory. Switching to the Observability plugin will knock 10-20% off of the cost of running Spinnaker, with greater anticipated reductions in the future.
- Better dashboards: The Observability plugin comes out-of-the-box with a number of helpful dashboards (and the ability to easily make more of your own). We’ve populated these dashboards with our view on the most important metrics to measure, and are actively updating them.
- Easier data aggregation: Spinnaker metrics today each have the service name in the metrics name, meaning you will potentially have 11 different versions of each metric, making it difficult to get cross-Spinnaker visibility. The Observability plugin strips the service name out of the metric name and turns it into a label, making it much easier to aggregate, filter, and group metrics across Spinnaker.
- No more abandonware: Armory and others in the community are invested in, and committed to, the plugin and will continue to actively maintain it.
How does the Observability plugin work?
The Observability plugin works in two key ways:
- It enables customizing the Micrometer registry
- It exposes an OpenMetrics endpoint for the Micrometer/Spectator metrics, which allows tools such as Prometheus or platforms that can query open metrics formats to work without needing a sidecar
These allow for key improvements in Spinnaker observability, such as removing the service name for each metric so that you can more easily organize and filter metrics for cross-Spinnaker visibility. They also create a more efficient implementation and dramatically reduce CPU and memory usage.
Today, the Observability plugin natively supports Prometheus and New Relic, though it will be easy to add in any new vendors supported by Micrometer (which is most of them).
For now, the Observability plugin does not provide logging or tracing functionality – the existing third party solutions out there, which integrate with Spinnaker already, are great. That said, as we see greater usage of the plugin in the field, we are constantly on the lookout for where to add new functionality that will enhance the value of the service.
How do I get started?
Installing the Observability plugin is simple – check out our docs here. The plugin is open source, so once you have it up and running, please submit feedback in GitHub or, better yet, contribute to the plugin directly. We are always looking for improvements. We also love to hear stories about how you are using the plugin (or Spinnaker in general), and those stories directly influence the new features we are building, so please blog, tweet, post in Slack, email us, and @mention us!