One of the most underutilized pieces of code I’ve seen in all my many years of Data Engineering is Bytes and Streams. I’m not sure why. It just never appears.
I see Strings, Ints, Floats, I see everything, but never a plain old Byte. Poor little bugger. I don’t know if people think it’s too complicated, in fact, it is less, less to go wrong, less complexity.
What’s more computationally expensive than Serialization and Deserialization? Especially in a Data Engineering context. Lots of data moving around, coming from this place and going to that place. Does it really need to be a String all the time? No.
Let’s take a look at Bytes, Streams, and Buffers in Python and Rust.
Thanks to Delta for sponsoring this newsletter! I personally use Delta Lake on a daily basis, and I believe this technology represents the future of Data Engineering. Check out their website below.
Keep reading with a 7-day free trial
Subscribe to Data Engineering Central to keep reading this post and get 7 days of free access to the full post archives.