Reducing Risk To Deliver Your Kubernetes Migration On Time
Some thrive in this age of infrastructure innovation, but we notice that most enterprises struggle to deliver projects on time while migrating to Kubernetes or a public cloud. Many enterprises take years to get a fraction of their workloads on new technology while a small percentage move quickly and get immediate benefits. As variables grow in any project, they introduce more risk. Reduce the variables and number of people involved in a project, and you reduce risk and can perform your migration incrementally to show iterative progress.
Only Measure Progress in Production
If it’s not in production, it doesn’t count. While staging environments may look ready for prime-time, there are always unexpected hurdles before going to production. Production is always further than it appears. At some point, the engineering leader must setup production milestones or deadlines that drive progress. The main culprit in delays to production are security and compliance which are typically (always) thought of near the end of the project but are critical to having successful infrastructure at the enterprise level.
This tactic encourages the team to work in small batches. In doing so, you reduce the scope of requirements and get faster feedback on what is working or not in production.
Reduce The Number of Variables & Technology Footprint
Introducing new technology is always challenging. As customers migrate to Kubernetes, they introduce a multitude of new technologies right out of the gate. Technologies like service meshes, traffic routing, operators, compliance & security tools, and even CI/CD tools all add a significant amount overhead and risk. Your end-state or vision is to “automate all the things,” but the complexity and risk could delay project delivery by years, not just weeks or months. Years might sound like an exaggeration, but you’ve likely seen the results of scope creep and these decisions first-hand.
The key success is to reduce the requirements to get your first set of applications in production. Do that by removing frivolous requirements (like a service mesh if you didn’t have one before) or finding applications that have little to no external dependencies, such as databases, external APIs or services provided by an external vendor.
Find Tools that Help Transition Between Solutions & Concepts
Going directly to containerization and Kubernetes can seem like the shortest path between two points, but, in reality, the technologies are very different and are scary even to seasoned veterans. Concepts like immutability, self-healing, auto-scaling infrastructure aren’t new to Kubernetes, either. AWS has had these tools for a long time and Netflix has been leveraging these tools for almost a decade.
Below are a set of tools that can be used to help transition teams into the world of containerization:
- Packer is an excellent tool to help ease you into concepts like immutability with VMs instead of taking a big leap to containers.
- Auto Scaling Groups (ASG) on AWS or Managed Instance Groups (MIG) on GCP, respectively. Get comfortable with the idea that machines can die at anytime for any reason in the cloud and that there are tools for self-healing. This also removes the reliance on IPs as a way of discovering services.
- Your existing Jenkins or CI/CD tooling. Use those to begin your journey; no need to start from scratch. Start transitioning towards a more cloud native/container friendly process with the tools you already have. The behavioral and process oriented changes are more complicated than the tools themselves. By using the existing tools and iterating on your process and behavioral changes, you reduce the amount of push back you’ll get from your teams as you steam ahead on your Kubernetes journey.
The tools you have today are more than adequate to start realizing the value that’s similar to what Kubernetes provides.
You can abstract the complexities of the underlying infrastructure as much as you want, but it does not completely absolve application engineers from knowing the details of the platform. When errors or production outages occur — the details matter.
Education plays a critical role in being able to debug issues. While Kubernetes and tools make our lives easier, it always takes longer than expected to become comfortable with those tools. Investing here is important and combined with a reduction in variables will allow you to design incremental courses that helps onboard your developers instead of overwhelming them.
Don’t Expect Your Platform to Practice Resiliency For You
Not all education has to come in the form of an online course or workshop. Creating controlled failure scenarios in a production environment is a great way to give engineers hands-on experience with the platforms they’re building on. Practice monthly “game days” that allow engineers of all skill levels on the team to get familiar with debugging failures in production.
The concept is simple:
- A “game master” designs scenarios that affect a portion of your production cluster.
- The “primary driver” types commands into a laptop while the rest of the team is there to support them.
The team’s goal is to identify what is wrong with the production cluster within an allotted amount of time and (ideally) fix the issue then and there. After each scenario, hold a mini retrospective to identify areas the team can improve upon next time. This builds trust among teammates, gives them practical experience operating their services in production, and highlights weaknesses in the teams ability to observe issues.
Celebrate the Wins
Infrastructure initiatives are rarely minor or without hiccups. As you accomplish milestones, it’s important that you celebrate them with the whole team. It’s a long road to completing these migrations, so celebrating wins as you move workloads into production helps buoy the team’s spirits and keeps everyone motivated towards achieving your end goal.
Following every strategy in this post isn’t guaranteed to make your migration successful, but we’ve found with numerous customers that your chances are significantly higher if you do. Mitigating risk is important to any software project, but especially so when modifying infrastructure that is the foundation of your business’s product. Being honest and upfront about the risk and discussing ways to mitigate them makes you far more likely to succeed.
Concerned about meeting your Kubernetes migration deadlines? Has your Kubernetes migration stalled? Come talk to us, we’re here to help!