Why I'm replacing Polars with DuckDB
... and other such tomfoolery
Well, I had my moment, the tipping or boiling point, I guess you would call it. I’ve had enough, and, in a moment of fury, I ripped Polars from its Lambda throne and supplanted it with DuckDB. Been a long time coming.
Look, before you feed me to the wolves, you need to get the whole picture. Don’t burn me at the stake or stick me in the dungeons of Castle de’cliff without a trial.
Let me spin you a tale, and you be the judge.
Ok, turn back the dials of time to circa 2022-2023. I’m usually an early adopter of technologies, and I pride myself on kicking the tires on new and shiny tools, taking people to task for what they say a “thing” does.
The proof is in the pudding, and one of the most popular articles I’ve written on my other blog was how to replace Pandas with Polars back in the Year of Our Lord 2023.
Heck, I put Polars in production workloads in those early days, when others were still watching from the sidelines.
Yeah, it had its rough edges from the beginning, but the speed of Polar’s Rust-based design was out of this world, and it was the tool I’d been waiting for a long time. The ability to work on large datasets in a streaming manner, replacing Spark in many workloads, was a dream come true.
Come on, bro, this is just a small sampling of all the content I’ve written on Polars over the years, just a subset.
What I’m trying to say is that I’ve used Polars for years in every possible situation. You know what I’ve learned over all that time?
There are two types of people and open source projects in the world. Ones that obsess about developer ease of use, non-breaking changes, taking issues seriously, and simply prioritizing kindness and openness.
At the end of the day, we want tools, wait … we NEED tools that are reliable, stable, where the maintainers take ownership to another level, and build with developers and their main use cases seriously.
Hey, I’m just a guy with an opinion; you can have yours. I’ve been fighting Polars issues in production for years now. Usually, I’m willing to just figure out a fix and move on, roll my eyes, etc.
I first got a bad taste in my mouth when I ran into memory issues, found the same problem on Polars' GitHub, only to have someone who wasn’t very nice close the issue as “not our problem.”
What made me switch from Polars to DuckDB?
I recently rebuilt some AWS Lambdas whose main function was to use Polars to read some data from S3, transform it, and write it back to S3. Ya’ know, I’m not a Luddite, pinned Python versions using the base AWS Python image.
Something like …
FROM public.ecr.aws/lambda/python:3.13
COPY . ./
RUN pip3 install polars==1.31.0
....I was only updating some logic, nothing else. Little did I know I was in for another Polars Easter egg. Of course, I woke up the next day to find Lambdas not working.
Should have been more rigid in my testing, I suppose. But we get complacent, and that’s how it usually works. Fix or upgrade some logic, something totally unrelated goes pop.
Yeah, some might argue that this sort of thing is probably a consequence of dual evils, Python environments (packages tied to packages tied to packages… supply chain), and life in Software.
In the perfect world, with the perfect set of tools and unlimited time to ponder the comings and goings of the various software engineering best practices, such things can be caught early and dealt with.
But, I find myself stuck in reality, competing priorities, limited resources, and time; you do your best within the constraints given. It was the straw that broke the proverbial camel's back. I, with only a slight tingling of remorse and with prejudice, ripped Polars from production and replaced it with DuckDB.
I sleep better already.







