K8S setup 2020 vs 2023 — toolset evolution
Hey guys, I’m back writing about K8S after my previous post where I described the decision process between ECS and EKS, some pros and cons, and now It’s basically the next step which sometimes is the point described as complex, because you have to define with which components you gonna start off in your cluster.
I would like to highlight that back in 2020 when I started the journey, I was thinking about the most basic and simple setup possible to reach a quick go live and prove the concepts getting benefits as well.
I will jump straight to the list of components and quick descriptions about each one:
kong ingress controller: responsible for the main entry-point to our cluster, mapping paths to our apps also handling the NLB exposed in our VPC, as the team had previous knowledge with it, the transition would be easier.
external dns: responsible for creating the DNS in Route53 to our NLB.
new relic bundle: responsible for the agents to ship metrics and logs to New Relic (company standard).
sealed secrets: responsible for encrypting/decrypting our secrets allowing us to save everything in our repos in a safe way.
cluster autoscaler: responsible for scaling up our node group based on the demand requested by HPA.
We were able to reach around 30k RPM in peak with no problems, We had a great set of pipelines using AWS Code Pipelines for CI/CD and from beginning using GitOps strategy which helped us to improve and make changes on it in the future with no major disruptions. We didn’t start with Helm because it would be too complex and more things to learn/handle at beginning, so going for a simple templating tool YTT + a deployment tool as KAPP was enough to keep tracking about changes.
Challenges and first changes
We had few issues with this setup, but the main one was the upgrade, I didn’t mention, but we started with version 1.18, so in the first upgrade to 1.19 We felt some degradation on performance due to the way our ingress was configured + missing guardrails as PDBs.
So the first lessons learnt + changes:
- deploy PDBs for all apps to avoid to kill a considerable amount of pods at same time
- minimum number of pods to 2 for all apps
- Changed ingress controller to run with ALB and without kube-proxy (IP mode), so the traffic goes straight to the pods, It helped to handle better the traffic switch and graceful shutdown of pods when upgrading the cluster.
- To be able to use ALB was introduced AWS Load Balancer Controller
This setup worked for a while, at least 1 year, but as soon as we started moving more traffic, bigger apps we saw the need of improving the architecture, the tools and adding more features, also, It was stable enough to make improvements.
Tip: +10 years ago I learnt to start projects with emergent architecture, It means, you don’t need to start with everything perfect from the beginning, you can change it at anytime, and you should prepare it for changes, this is a nice article about it. So think about your current needs and future needs, pave the way and plan when you can execute them, this will save you a lot of time. Keep dreaming.
Allow me to move on to the current state in 2023, 3 years later — almost 4 — , after the first PoC some things changed, new tools added and some improvements noticed, I’m gonna list below the new components and quick descriptions about them:
argo-cd: Responsible for the deployments, It was considered other options like Flux, but Argo has better integrations and UI.
istio: Responsible for our service mesh, It was considered other options like Linkerd, but Linkerd didn’t have any feature available for external Egresses and circuit breakers (now it supports CBs).
cert-manager: Responsible for generating certificates for LBs and service mesh.
AWS load balancer controller: Responsible for the main entry-point from our cluster, handling the ALB in AWS.
keda: Responsible for scaling up/down our deployments based on events from RabbitMQ.
I would say this is not the final stage yet, there is a lot of potential for Helm in the future replacing YTT for templating. There is some room for Crossplane to manage the AWS infrastructure, which now is managed outside the cluster using Cloud Formation. There is also the option of Karpenter to handle the instances and improve the upgrade process.
Conclusion
Kubernetes has many tools available, each one has its own purpose but does not mean you need all of them or that the ones listed here will be the right ones for you. Test it, read opinions from community, join the slack channels of these projects, compare the options, sometimes you don’t need everything at first moment, go easy, define priorities and how to reach them in short/long term vs benefits, maybe the simple one will supply tour needs for a while and you can focus in something else until you really see you need something else, I hope It helps in your journey with Kubernetes!
Cheers!!