Moving to Microservices Is Much More Than A Technology Choice
Dec 31, 2019 by Isaac Mosquera
Since we started Armory 3 years I’ve heard almost all of the Fortune 500 say they are embracing microservices for a pay-off in productivity and scalability. And yet most are unable to propel their organizations adopting this practice due to a lack of trust and autonomy given to their software engineers.
Historically we all developed software into one monolith. We only had 1 codebase. 1 deployment pipeline. 1 heterogeneous set of machines. 1 load balancer. One database. It was simple. But as the internet and services grew we realized that having hundreds or thousands of developers coding on one code base for a hundred different features isn’t scalable. We became bottlenecked. Not only that but when a new release went out the door and it failed, how would you even begin to debug the problem? Which line of code that one of those thousands of developers touched is the root issue? If a feature needed an upgrade in the database connector library but broke other features as a result of that upgrade, how do you proceed? The solution to this problem was microservices. This removes the bottleneck so that we can break up large teams into smaller ones so they can move independently of each other while still being interdependent.
Dependent people need others to get what they want. Independent people can get what they want through their own effort. Interdependent people combine their own efforts with the efforts of others to achieve their greatest success.” — Steven Covey, The 7 Habits of Highly Effective People
While this solution sounds great, it does not come for free; there is, of course a cost to everything. At first glance, the cost is an overhead to duplicating pipelines, codebases, functionality. This happens to infuriate central DevOps teams. “Duplication & non-standards ways of building software is the heart of this mess we’re in” you’ll hear them scream.
It looks and feels like more chaos with less control.
Yet, wasn’t that the point? Freedom and autonomy even over control and centralization. When we create large, complex systems where we all don’t need the same thing we need to live by a looser set of rules. In moving from a monolith to microservices we moved from a complicated system to a complex system. Complex is used to refer to a large number of unpredictable components. On the other hand, complicated refers to predictable behavior. For example think of a watch. While it has many moving parts they must move in the same way every second of the day. The parts have a causal predictable relationship. You do X, you get Y. Your market on the other hand is unpredictable. Like the weather, we can try to predict it, but there are far too many moving variables. The weather, like the marketplace is a complex system.
When working with a complicated system (predictable), rules, strict constraints, controls are absolutely necessary to achieve an intended outcome. For example: if you move the big hand on your watch clockwise or counter clockwise the small hand will follow which ever way you choose. When you input certain code you know exactly what the outcome will be. What do you do then with a complex (unpredictable) system like users who will use your system (predictable code) in ways you didn’t expect? How will you scale? What features will you build to accommodate this new user behavior? You create principles, guidelines and systems that allow for the flexibility to respond to a current environment in turn making the system more resilient and able to adapt quickly. The problem that arises is when you treat a complex system like a complicated one.
To illustrate the point, below are two microservice architectures for Netflix and Amazon. The 2 companies who originally evangelized this architecture. Imagine trying to build rules and systems for this complexity. It is impossible to do so. We must think differently about how we operate.
What we are seeing is that companies want microservices and all of it’s rewards. But they are still treating it like it’s a monolith. We call this the distributed monolith, all of the overhead costs and none of the benefits. Therefore, at second glance the cost of microservices is more cultural and psychological than technical, truly embracing microservices requires letting go of control and the sense of security while fully trusting your engineers. Which brings us to service ownership.
What is Service Ownership?
Service ownership is owning and being responsible for all aspects of that software. Historically the job of software architects was to solve these types of problems for the monolith but in a microservices world you can’t hire enough architects. Therefore, everybody is an architect. Everyone is an operations engineer. Everyone is a QA engineer. We must move to a world where trust and autonomy is given to each individual team to make the right decisions for them based on real-time changes in their environment, or market, and not from a centralized command who doesn’t have all the information. Service ownership is the best way to deal with the complexities of microservices.
The following are a few of the major areas of ownership:
- Tests (QA)
- Operations, Uptime, SLA
- Pager duty/on-call rotation
- Deployment Pipelines
- Paying down tech Debt
- Feature development
It is important to note that this isn’t owned by just one person but the entire team. The entire team should be a relatively small set of people, say 5-15, that can manage the complications for just 1 app. Notice we didn’t use the word complexity. Within a single app we can reason about it’s components since they’re limited. We can build rules for a single microservice but not for the system as a whole as it is too complex. The moment we feel like a single microservice is getting too complex (i.e. too many unpredictable components, functionality and scalability) we can separate it into it’s own microservice.
Importance of Culture
What do we mean when we say culture? We think of culture in terms of how we do everything. While many think corporate culture is about perks and benefits we define it as how we operate together. The agreed upon behaviors, beliefs, languages, systems and mindsets that drive everything we do.
If we value trust and accountability what does that look like?
Service ownership is not just an operating model it is a belief and behavior. The belief is that your developers are highly trusted and competent engineers, so much so they have the responsibility and accountability of owning their work from pipeline to deployment. The belief is that when you give your developers this level of responsibility they will take greater care at each stage because ultimately they are the owners. This level of responsibility also drives up engagement and satisfaction in one’s work.
The risk vs the reward: many engineer leaders are too scared to offer this type of freedom and ownership. Understandably so, this is how they have been trained throughout their lives, to believe that control and structure will lead to the best path. But as our world and the technology we use becomes more complex, the fear of letting go of control will be the very thing that stops us from advancing.. We would urge anyone who remains in this way of thinking to askthemselves, “Is this way of working, really getting the best out of your engineers? Do you think you are truly tapping into their potential? If this fear didn’t exist, what would you be willing to try?”
How to Move Forward
Full autonomy and ownership might be a fearful step for many or might be a natural next step in your organization’s cultural transformation. In either situation the next step is to experiment. Make something safe to try. Take a more advanced team and give them full ownership of their services. Now before you give them full control it is important to agree on expectations as a team, measure the results and a timeline. Here are a few suggestions of measurements: uptime, incidents, productivity, mean-time-to-resolution, team engagement, team’s work satisfaction. You can measure pretty much anything that determines success for your organization but it is important that you start with something that is safe to try for everybody involved. Once you gain success with one team working in this new style of freedom, autonomy and ownership you can create a second team to keep experimenting. In our experience, once you have tried this new style of working in service ownership, you never go back. Once you see it and feel it you will wonder, “What exactly took us so long?”
Start by starting. You will be amazed at what’s possible.
About the Authors
Kate MacAleavey, Head of Culture and Leadership – is an expert in the area of positive organizational psychology and utilizes a strengths-based model to increase job satisfaction, employee commitment, trust, engagement, and many more important psychological traits. She previously worked as a consultant in large scale culture transformation and was the head of Individual Contributor development at Facebook. Kate has an M.A. in Positive Organizational Psychology and Evaluation and an MBA, both from Claremont Graduate University.
Isaac Mosquera, CTO/Co-Founder – has been leading an engineering and product team engaged in the Spinnaker community for three years. Prior to this, he spent 20 years architecting large-scale systems at companies such as ShareThis, Socialize and XM Satellite Radio, as well as creating high scalability/load websites for brands such as Games Radar by Future Publishing. He co-founded Armory to help enterprises innovate through happy & productive engineering teams.
Recently Published Posts
Welcoming 2022: Reflecting and looking forward
Nearly all cultures globally have some form of celebration marking the Winter Solstice. Common threads found in most observances of the annual event are celebration of family and friends (living and past), reflection of the past year, and some form of giving thanks for continued health and sustenance. Exiting 2021, said celebrations would seem especially […]
Read more →
Resiliency and Load distribution
Introduction When scaling a network service, there are always two concerns: resiliency and load distribution, to understand these concepts let us first understand the broader term “Redundancy”. Redundancy is the duplication of a component to increase reliability of the system, usually in the form of a backup, fail-safe, or to improve actual system performance. Resiliency […]
Read more →
CVE-2021-44228 – log4j (Log4Shell) – an analysis
Today marked a 0-day disclosure of a rather nasty vulnerability in one of the most commonly used frameworks for logging – log4j. This one is nasty on multiple levels. Note that Armory Enterprise is NOT affected by this vulnerability. The impact on this vulnerability is likely huge and is already being exploited. Additionally it can […]
Read more →