New Mojo Lang for Data Engineering
What's all the hype? Is it real for the average Data Engineering workload.
I’m not sure the new Mojo lang has what it takes. It’s hard to say. I haven’t even used it yet. They sure are trying hard and have been doing lots of marketing and hype-pushing.
If you live in the forest under moss and rocks like a hermit, don’t we all wish, and haven’t heard of the new Mojo language, well good on you. But the rest of us are putting up with it.
Could it possibly be true? Someone has replaced Python with a fast version of Python that includes static typing and immutability … but yet it is still Pythonic and interoperable with Python as we know it?
Seems like a tall order.
“Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.” - Mojo
Still, you think about how popular Rust has become, but it has been a slow burner, rising up through the ranks and finally breaking through in all its glory.
Summarizing Mojo so you don’t have to.
I'm not really interested in doing some “hello world” crud with Mojo. There is plenty of content starting to come out. What I’m going to do, if I can, is try to see what Mojo means for the Average Data Engineer.
I will give a little time to the approach of Mojo, and what it feels like as a language, but I want to focus on its application for general Data Engineering, and if we should care about it.
What’s the deal with Mojo?
Here is some stuff about Mojo that you should know about.
You can import and write normal Python modules/packages and code.
It looks and feels like Python.
Mojo wants to bring “systems programming” to Python.
Mojo adds `let` and `var` function declarations to give immutability, or not.
Mojo adds those famous `structs`.
The ability to have static typing/checks at compile is available.
Addition of `fn`s instead of `def`s for more immutable, static, and generally stricter methods.
Mojo brings the concepts of borrowing and ownership to immutable `fn`s, in a Rustacian sort of twist into Python.
Here is a quick and dirty of the above.
let a = 123 # <- this immutable ... can't be changed later.
var b = 'ABC' # <- this is mutable, can change to anything at any time.
from PythonInterface import Python
let pd = Python.import_module("pandas") <- wala, Python as you know it.
# structs ... aka Classes.
struct Bing:
var ding: Int
var dong: Int
fn __init__(inout self, ding: Int, dong: Int):
self.ding = ding
self.dong = dong
def hey(inout self) -> None:
print(self.ding+self.dong)
b = Bing(1,2)
b.hey()
>> 3
Initial Thoughts.
My initial thoughts are a little all over the place. One part of me says if you’re going to be something, be something … but Mojo seems to take the middle ground. It’s neither Python nor Rust, it’s both, they had a baby and called in Mojo. Which is surprisingly confusing.
Honestly, I found it harder to write Mojo the first time than when learning Rust. Some of the concepts are too muddied together, like the `Struct` which is basically a Python Class, except not, more immutability and definition has to be complete at compile time.
The introduction of immutability via the `let`s was fine and welcome, but the strange implementation of Structs that act more like classes, combined with the unclear imagination of static, immutable, borrowed, and mutable concepts requires strangeness like `inout` you see above.
In essence, Mojo has managed to neither be Python nor Rust, but a strange mix, that appears to be more confusing than anything.
I have a funny feeling Python developers are going to shy away from the ideas and syntax, mostly because they are using Python for a reason … that reason being flexibility and ease of use. If they wanted compile time type checking … they would be using Golang, Rust, or Scala already.
I sort of get it.
Look, we all get it. Python is its own worst enemy. It’s beautiful and expressive, allowing rapid prototyping and the like. Hence the entire ML world is carried on its shoulders.
But it’s also slow, and error and bug-prone. It appears Mojo is trying to solve those problems. Give speed and the ability to immutable and static typing along with all those benefits.
Annoying.
The most annoying part about trying to learn Mojo is the horrible documentation. I mean, as someone who’s been writing Rust I understand a lot of concepts and how in general they would be beneficial and improve the performance and generally reduce bugs etc,but trying to sus out even how to use the Structs, or scroll through the built-in modules specific to Mojo (not Python), was a terrible experience.
For example, we Python folks and Data Engineers use lists quite often. Apparently, Mojo gives us a customer ListLiteral … what I would use it, how I would use it, and why I should use it are a complete mystery.
Apparently, the only thing Mojo wants you to care about is that it is …
faster than Python (like that takes a lot).
immutability and static typing available.
I guess everything else is just a thing you might not care about.
Again, I was scrolling through the modules trying to figure out file reading for example, maybe something in the IO module where we could read a CSV file and then be able to do something with it?
Yeah, no, apparently you can only print from this module.
Mojo for Data Engineering.
Well, I’m not really sure where to go from here. I will try to stay positive and not be Gandalf Storm Crow about the whole thing. There are some parts of Mojo that of course will be helpful to Data Engineering, as they would to programming in general.
immutable vs mutable.
static typing options.
structs.
borrowing and references and the like.
it’s fast …?
I don’t know, those are the reasons I use Rust, and it’s a joy to write. Mojo is not, it’s a confusing amalgamation. If I want Python flexibility I will use Python, if I want fast, immutability, etc I will use Rust. Because those tools focus on “their thing” and do it well.
Mojo doesn’t do either well.
“We expect that folks who are used to C++ and already use MyPy-style type annotations in Python to prefer the use of
fn
s, but higher level programmers and ML researchers to continue to usedef
.” - Mojo
So what’s the point, to force both people to use the same language?
Let’s try a Data Engineering Task in Mojo.
Ok, let’s not. I wrote enough Mojo to decide against it. It just feels strangely wrong writing it. Is it because I write both Python and Rust, so this feels like heresy? Probably.
Go read the docs for Mojo yourself.
I do try to do some normal data engineering for you.
tried reading some files from s3 to do some stuff, no go. Mojo couldn’t handle boto3.
uploaded a CSV file and tried to read it with Mojo, no go, Mojo crashed.
After reading the rest of the documentation, I’ve come to the conclusion if you would like to build your own Machine Learning models and algorithms from scratch, Mojo might be your thing, otherwise, steer clear. Maybe they will fix some stuff before the final release.
"It’s neither Python nor Rust, it’s both, they had a baby and called in Mojo."
Bahhh 😂🤣