Kubernetes. K8s. Man, I remember those days when it was hot hot hot, and you were an idiot if you weren’t deploying something onto Kubernetes. If you didn’t know how to use kubectl then you were a prehistoric human, almost as bad as those SSIS users.
Well, I’ve had my fun in the sun. I used Kuberentes for a few years before leaving that cult for greener pastures.
You should also check out Statsig, the sponsor of the newsletter this week. Statsig automates experiment analysis either on top of your warehouse data or in their cloud, and they’re announcing a new feature every day this week.
Companies like Notion, OpenAI, Brex, and Anthropic use Statsig to power their experimentation and feature management. Statsig is celebrating experimentation week by shipping a new feature every day – today is Interaction Effect Detection, which provides peace of mind that two experiments aren’t interacting in an unexpected way. Click here to see the release post, and follow along with all of the launches this week.
It seems to me, as far as I can glean from putting ye’ old ear to the ground, that I hear and see far less “Kubernetes for Data” marketing drivel being poured down the throats of unsuspecting victims.
I think this is generally a good thing, although Kubernetes for custom data platforms is probably the only real option.
Let me explain.
Kubernetes for Data Platforms.
Let us wind the back the clock of time a few years … say circa 2019. This is just a few years after the release of Kubernetes in late 2014 early 2015.
It’s basically at the height of the hype train.
I mean, to be fair, anyone who has worked with pools of EC2 instances and half-baked tools like AWS Fargate will sacrifice their firstborn to the Kubernetes gods for good reason.
I myself, in my young glory days when I was full of bitterness and vitriol, have used GKE (Google’s hosted K8s) to build a large Data Platform capable of processing hundreds of TBs of spatial data with many thousands of PODs running in parallel. Sometimes K8s are the only real option (at least back in the day).
What does maddened Reddit rabble have to say about K8s in a Data context (not that we should trust them for anything, since they are a fickle and bloodthirsty crowd)?
More or less, Kubernetes can play a few key roles.
Used to host third-party tools that are K8s-ready.
Ex, many tools provide HELM charts to deploy their tool.
Custom-built applications that need scale.
To host service-oriented data platforms.
This can, and will, usually end up on one end or the other of a spectrum of suckage depending mostly on the company and environment (and the dreaded Platform Engineering team).
What makes Kubernetes so attractive as a Data Platform, and for deploying “things” in the first place?
Because it’s so broad, it doesn’t care what is being deployed … as long as you can containerize … aka Dockerize your “thing” … there is a good change you can run it on Kubernetes with low effort.
So what then you say? If you can simply package up your Data Platform tools into Docker and throw that bag over the wall onto a Kubernetes cluster… why doesn’t everyone do it???
When does Kubernetes suck?
While all the Platform teams sharpen their butter knives to put me out of my misery, I will call their bluff and tell you how some Platform Engineering teams drank the kool-aid and foot-gunned themselves into ding-dong land, all the while, destroying the productivity of hundreds of Software Engineers.
I will try to be succinct.
Travel with me (a real place I’ve been) to a 1,000-person growing Startup. Said startup is in full growth mode and thought they needed Google-scale with a few hundred users, and drank the kool-aid.
Kubernetes to the rescue.
The gate-keeping Platform Engineering team only allowed things to be deployed on Kubernetes, that’s it, no exceptions. K8s' baby, ride or die.
No documentation, all tribal knowledge on how to get something deployed in their “special way.”
Any new “thing” that needed to run was 80% yaml files, 20% code.
You had better be a Service savant and not get to greedy with your resources, only 1/2 CPU for you bad data engineer!
Even if you needed a Postgres database … no-no RDS … must deploy your own POD.
Large organization, slow to respond to tickets or requests for help, you were at their mercy.
Eventually, they came to see the light, costs, and the sun beaming down from on high and their finding tools like AWS Lambdas actually existed (I joke not), eventually loosened things up.
It was almost as if there was no other world but k8’s, its dirty little fingers found their way into every single decision and design. If something needs to be done, it HAD to be in the context of K8s.
This is not the way. Tools we use should enhance our design and software, not put it in a box.
The world has moved on.
Technology changes, not everything needs to be run on a single tool. Should you run Apache Spark on Kubernetes? What’s the matter with you?! It’s called Databricks.
Need Airflow?
It’s called AWS MWAA or GCP Composer or Astronomer for crying out loud. You don’t need to deploy your own Airflow setup on Kubernetes (yes I’ve actually done this).
Need a database? It’s called RDS.
It’s so amazing that some people get caught up on a single piece of tooling that they literally don’t think technology has moved on and the market has provided better options.
This has nothing to do with “Kubernetes is bad,” this is a simple fact that the universe we live in tends to chaos and atrophy. New things get designed, they inevitable do things better than the old things. That’s just life.
It isn’t trivial to run your own Kubernetes cluster, it isn’t trivial to deploy complex services onto Kubernetes. Managed services have won the day, get used to it.
The problem with saying “Just deploy it on K8s,” is that it’s a naive way to approach solving problems. THERE IS NO FREE LUNCH, and K8s comes with its own set of problems.
Who’s managing the Kubernetes cluster(s)
It’s “harder” for devs to work on something deployed on K8s than a simple managed service.
YAML and DevOps in a K8s-centric org become half the job of every single Software Engineer.
I wax poetic and long.
Long Live Kubernetes.
But, here we are at the end. Long Live Kubernetes! Long Live K8s! What a wonderful tool it really is.
It’s hard to put into worlds the impact K8s has had on the Software and Data Engineering world. The fact that so many tools provide HELM charts and you can deploy a complicated data orchestration tool onto Kubernetes with a few CLI commands is amazing and speaks volumes to the power of K8s.
When you run into custom Data Platforms that need to be built like I did oh those many years ago … if it wasn’t for Kubernetes … I probably wouldn’t be here today. I would still be blithering over my keyboard in some dark and musty basement of a mental institution.
I doubt Kubernetes will be going anywhere FAST for the next decade. It will be here, back in the corner quietly and methodically hosting all those PODs like Atlas holding up the world.
But I dare say there will be a slow decline. It’s the way. The amount of SaaS products available today and what they can do has taken a chunk out of the ole’ lunch of K8s.