How do you grow and scale?

Nov 4, 2021 by Christos Arvanitis

Unlocking innovation by making software delivery reliable, scalable, and safe is complex. It becomes even more complex when your delivery platform cannot scale properly and fails to meet the needs of your development teams when they need it most.

Spinnaker is awesome, no question. We know Spinnaker well, and we love Spinnaker, which is why we know that operating Spinnaker is complex. Between configuring Spinnaker, managing the underlying infrastructure, monitoring and meeting the SLAs/SLOs, your team will have their hands full. This (without a doubt) will lead to operational inefficiencies that will eventually slow down your deployment frequency.

Scale Spinnaker for faster growth

Operating at scale comes with a lot of pain points for the team operating Spinnaker. Armory’s Managed team operates Armory Enterprise for Spinnaker environments at scale every day, so we’re very aware of these pain points. Given that, we’ve compiled a few tips to help you scale:

Let’s put all of the above into an example:

Consider a very simple use case where we have 30 AWS accounts, 10 ECS accounts, 100 Kubernetes clusters, and we need to onboard another 50 Kubernetes clusters with 2 cacheThreads.

A rough estimate for the current running caching agents would be:

30 x 20 + 10 x 10 + 100 x 2 x 2 = 1100 caching agents

Clouddriver will need an additional 200 caching agents for the 50 new Kubernetes clusters. The startup time of Clouddriver on every deployment could be up to 8-10 mins, and we would need at least 13 Clouddriver replicas with the default concurrent caching agents config.

By using some of the tips discussed above this scenario would experience even greater scale. For instance, by tripling the concurrent caching agents and vertically scaling Clouddriver, you would need only 5 Clouddriver replicas. Additionally, by switching to 3000 baseline IOPS disks, you will drastically reduce the start-up time of your Spinnaker services.

Are these tips enough?

At Armory, we have open-sourced the Spinnaker Observability Plugin, and it is a great fit for monitoring when Spinnaker is operating at scale. Combined with the spinnaker-mixin dashboards, the plugin is a great solution for gaining insight into the operational performance of your Spinnaker platform and (by extension) your software delivery.

Your Spinnaker environment will perform faster, scale better and be more resilient, but some operational burden will still be there. For instance, every new Kubernetes account will require your team to setup the networking, proper permissions, configuration and then redeploy Clouddriver.

One option to consider to address ongoing operational issues is Armory Agent for Kubernetes. The Armory Agent for Kubernetes can help you reach massive scale by enabling account management at the team level and accelerating the onboarding of Kubernetes clusters on the fly without the need of redeploying Clouddriver.

For more information about either Armory’s Managed offering or the Armory Agent for Kubernetes, contact us. We’d love to hear from you!

Share this post:

Recently Published Posts

How to Become a Site Reliability Engineer (SRE)

Jun 6, 2023

A site reliability engineer (SRE) bridges the gap between IT operations and software development. They understand coding and the overall task of keeping the system operating.  The SRE role originated to give software developers input into how teams deploy and maintain software and to improve it to increase reliability and performance. Before SREs, the software […]

Read more

Continuous Deployment KPIs

May 31, 2023

Key SDLC Performance Metrics for Engineering Leaders Engineering leaders must have an effective system in place to measure their team’s performance and ensure that they are meeting their goals. One way to do this is by monitoring Continuous Deployment Key Performance Indicators (KPIs).  CD and Automated Tests If you’re not aware, Continuous Deployment, or CD, […]

Read more

What Are the Pros and Cons of Rolling Deployments?

May 26, 2023

Rolling deployments use a software release strategy that delivers new versions of an application in phases to minimize downtime. Anyone who has lived through a failed update knows how painful it can be. If a comprehensive update fails, there are hours of downtime while it is rolled back. Even if the deployment happens after hours, […]

Read more