Last time I asked why we keep going full cloud / compute-heavy / Spark-based, someone told me: “because it’s cheap and it runs by itself.”
I can’t help but wonder — maybe most CTOs still prefer having a shiny “modern data platform” that seems to run effortlessly, rather than investing real time and energy in optimizing it with tools like the ones you mention.
I have found over the years that using a single node works great in Databricks for many jobs. It was nice when they finally surfaced that as an option. Hopefully this will get easier with time. Essentially as a developer, I don't want to have to care what the hardware is underneath. I just want it to work reliably and not cost too much. I think this is the promise of serverless.
Daniel, I am wondering if you have seen SQLFrame (https://github.com/eakmanrq/sqlframe) ? I ran across it the other day and it seems like a pretty elegant way of maintaining the same portable code that can run against multiple data backends.
Great piece!
Last time I asked why we keep going full cloud / compute-heavy / Spark-based, someone told me: “because it’s cheap and it runs by itself.”
I can’t help but wonder — maybe most CTOs still prefer having a shiny “modern data platform” that seems to run effortlessly, rather than investing real time and energy in optimizing it with tools like the ones you mention.
💯 we can still dream though
Nice. the poll results says it all. 53% responded, I wish..
Well, we all kinda knew that was going to happen, didn't we??!! Hope springs eternal, they say.
I have found over the years that using a single node works great in Databricks for many jobs. It was nice when they finally surfaced that as an option. Hopefully this will get easier with time. Essentially as a developer, I don't want to have to care what the hardware is underneath. I just want it to work reliably and not cost too much. I think this is the promise of serverless.
Daniel, I am wondering if you have seen SQLFrame (https://github.com/eakmanrq/sqlframe) ? I ran across it the other day and it seems like a pretty elegant way of maintaining the same portable code that can run against multiple data backends.
That one is new to me. Hard to keep track of em' all.