Love is a hard thing. Love requires work, patience, longsuffering, and the ability to stick through the hard times, the bitter times. That’s what it’s like to love Rust.
Rust is a jealous lover, whenever you find yourself writing Python or some other language … well, life just seems dull. There is no spice and excitement. But, we have to be careful. We can’t just fall head over heals with infatuation, not love, and end up wasting our time and energy on a lover who doesn’t give us anything in return.
This is where I find myself with Rust. I picked it up just because The Primagean seemed impressed with it. I struggled for a day or two, and since then have slowly started using it more and more.
I've found myself becoming angry with my plain old lover Python. Angry about packing, random import errors, cursing it, asking it why it can’t be more like Rust.
Today I want to talk about Rust in a Data context. Where is it being used, can it actually be used for Data Engineering, and what does the future look like?
Thanks to Delta for sponsoring this newsletter! I personally use Delta Lake on a daily basis, and I believe this technology represents the future of Data Engineering. Check out their website below.
Let’s dive in.
A Rusty Love Story.
As many of my long-time readers probably already know, I’m the eternal skeptic when it comes to anything programming and Data Engineering related. I think everything has its time in the sun, things come and go, rise and fall. Everything has its place, and there is a place for everything.
Rust has been a “hot” topic of late in every tech circle, and Data Engineering is no exception. Everyone was righly skeptical until tools like Polars took the DE community by storm.
Rust is one of those languages that takes time. Time to learn. Time to like. It has a little bit of a learning curve to it, and honestly, right up front, that limits the reach that Rust has and ever will have in the Data Engineering community.
In a world where Python reigns supreme, Rust is a hard sell to 90% of the everyday Data Engineers writing pipelines on a daily basis.
We are going to talk about why, but first, here are a few articles I’ve previously written on the subject of Rust + Data Engineering.
Ownership and Borrowing in Rust – Data Engineering Gold Mine.
DataFusion courtesy of Rust, vs Spark. Performance and other thoughts.
Delta Lake without Spark (delta-rs). Innovation, cost savings, and other such matters.
Dataframe Showdown – Polars vs Spark vs Pandas vs DataFusion. Guess who wins?
Working with Cloud Storage (s3). Golang vs Rust vs Python. Who shall emerge victorious?
Thoughts on Saint Augustine, Rust vs Golang. Complexity, verbosity, and other matters.
Now since you’ve read all my wonderful articles .. em… yeah, you’re now a semi-expert in Rust in a Data Engineering context. But, let’s talk about
Reasons (data) people love Rust.
Reasons Rust will not be (Data Engineering) Mainstream.
Let’s dive in.
Why (data) people Love Rust.
Love is a hard thing to define. What exactly is it about Rust that people love, and more specifically, Data people? If you have Golang, Scala, Java, and all the rest, what does Rust have that has turned it into a hot topic?
I mean we’ve started to see some serious traction pickup with Rust when comes to tools like Ruff, delta-rs, Polars, etc. It kinda makes you wonder what’s coming next. Maybe it isn’t all just hype, there could be something to the Rust bandwagon. Or maybe not.
As someone who’s used Rust off and one for 6 months in a Data Engineering context, here’s what I think. I want to present this information in such a way that someone who’s never written a line of Rust can grasp why some (data) people are choosing it.
Here are the top few reasons.
cargo (packaging and dependency management)
fast (blazingly fast)
not terribly verbose (learnable)
memory model, immutability, and static typing
Let’s unpack each one.
cargo (depends + packaging)
Probably one of the least appreciated parts of any language by newcomers to programming, in general, is the packaging and dependency management of an ecosystem.
For example, in Python, with tools like pip, you have the allusion of being in control and easy to use, which it can be in some sense. But, it’s clear to anyone that has used Python, for example, for an extended period of time, that there are massive problems with Python’s packaging system that can have serious production repercussions.
a tangled web of package versions
the ability to not be specific with versions
things stop working randomly at some point in the future.
When you start writing Python for production use cases that you depend on, you will inevitably at some point start fighting versions, things WILL break somewhat randomly, and even backing packages into things like Docker can and will eventually bite you.
Is it possible to do everything perfectly and keep things from breaking, sure, if you have a team of Ops Engineers whose sole responsibility is to prevent such things (I’ve worked in that environment and that still fails sometimes).
Enter cargo, the Rust package manager. What can you do with Rust, or what would you do with Rust that makes it simply a pleasure to work with?
These commands may seem simple, and they are … that’s the point. Want to create a new project with everything you need `cargo new {project}` will do the trick.
Need to add some crates to support your requirements? Good ol’ `cargo add {crate}` will be your go-to.
Ready to build and run your project, simple as `cargo build` or `cargo run`.
Honestly, the simple ease of use and bullet-proof ness of cargo makes working with Rust a pleasure and a breath of fresh air in a Python world (I say that as someone who still writes Python on a daily basis).
Rust is (blazingly) fast and not verbose.
I’ve done my fair share of Python vs Rust blog posts that get everyone angry and yelling at me. Yet, at the same time, everyone really knows that there is no comparison.
Besides a few angry hold outs on the internet, most folks are not going to argue with you about how is faster at most things than most other languages.
The thing about Rust being blazingly fast is that it isn’t like you’re writing C or C++ to get that performance, you can write code that is legible and understandable, and yet at the same time is faster than fast.
It’s like you get your cake, and you get to eat it too.
For example, in my GitHub I have some code where I compared Python to Rust inside an AWS Lambda. Let’s say I want to download the contents of some file(s) from AWS s3.
There is no magic about that Rust code. It’s fast, it’s readable, and it’s pretty clear what’s going on. Sure, like any language it’s going to have its initial learning curve, but it is approachable and doable. The proof is the fact that if I can write Rust that works, so can you.
Rust is fast, that’s why some data people love it.
Memory model, immutability, static typing.
Ok, before you roast me, I know not everything listed is unique to Rust, but the point is ALL these things exist together as a unit, the crates go with the memory model go with the static typing. It’s a package deal, my friend.
Probably one of the biggest learning curves with Rust is the Ownership and Borrowing memory model. It still catches me up. But, it also protects me from myself, and I need that, let me tell you.
Folk who’ve been around the data world, fought production bugs, and tried to make life easier, these people will understand the importance of …
static typing
memory safety
immutability
Two of the three aren’t specific to Rust, immutability and static typing can be found in many other languages, but Rust does shine in the memory safety (ownership and borrowing) model.
Simply knowing something is immutable or not makes programming and debugging large and complex code bases easier.
Static typing, like in many other languages, helps to prevent errors and hardens the development process to reduce errors that can happen at runtime with languages like Python.
The below code has a memory bug that will not let the code compile. Rust will throw an error when trying to build.
Did you find it? Keep looking.
Borrowing and ownership make Rust bulletproof and very scalable. Rust allows you to be specific in your code if some method or function “owns” the thing it’s using, or if it is “borrowing” it … it’s a very powerful paradigm.
Why do (some data people) love Rust?
Rust is an easy choice for things Data Engineering folk who are looking to build fast, scalable tooling for others to use. Is it helpful for the everyday Data Engineering task? Not really.
Yes, it will make you a better programmer, and you will have fun, and create cool things, but most of the data-to-day work will never be done by Rust per se.
The future is probably more like the Polars way, some smart Engineers build tooling in Rust, then wrap it with Python for others to use.
I think you should learn and use Rust.
It’s not that hard to learn (after the first two days).
It’s (blazingly) fast.
It’s starting to be used on the periphery of Data Engineering.
People will think you’re smarter than you are.
It will make you a better engineer.
I dare you to try Rust for one week. You will fall in love.
Very enjoyable read, Daniel! And yes, everything you say it sooo true, eh?! When you get a chance, check out Frank McSherry's blog: https://github.com/frankmcsherry/blog. He hasn't updated it in awhile but he has some goodies on his trajectory for Rust, eh?! :)