Remember in high school when all your friends kept telling you to stop spending so much time with that person? To break up, get it over with, that you were no fun anymore and never around?
Well, I’m here to tell you the same thing again. Think of me like your mother, standing over you at the supper table, shaking my finger at you, “You need to break up!!”
“It’s time. Time to break up with SQL.”
Heresy I know, I can see the angry mob lighting their torches and sharpening their pitchforks and pikes as I speak the words. My days are numbered. Here is what we will cover today.
Why you should break up with SQL.
How SQL is holding you back.
Broadening your horizons.
The importance of the right tool at the right time.
Closing Thoughts
Why you should break up with SQL.
Things come and go, and sometimes they can back around again, that is the case with SQL. I remember a decade ago, well before the days of Databricks, Redshift, and Snowflake when the cool kids were running Data Warehouses on SQL Server, Oracle, and SAP.
Ahhh … the good ole’ days … or not.
“With the popularity of SparkSQL and Snowflake, pretty much every Data Engineering tool is following the bandwagon, offering a SQL API to allow for easier adoption and usage.”
SQL here SQL there, SQL SQL everywhere. Ahhh! Look I know this is going to poke some people the wrong way, I get it. There was a time in my life when 80%+ of the work I did was SQL related. But, we have to be honest with ourselves about the side effects of such SQL-heavy work.
I want to be clear … SQL is awesome and lowers the barrier towards Data Engineering, it also makes iteration and development very fast. There is a reason everyone uses it. But, you do need to have a break up with SQL, let me tell you why.
SQL-only stacks stifle innovation.
SQL-only stacks block the growth and learning in DEs.
SQL-only stacks become inflexible over time.
SQL-only stacks don’t support Machine Learning and DS workflows well.
SQL-only stacks tend to have fewer or no tests (unless you’re using DBT).
SQL-only stacks create overly complex logic and slow down debugging.
SQL-only stacks tend NOT to favor functional reusable code.
“These ideas of course are not blanket statements for every Data Team but generally hold true in the long run and in most cases. You need to break up with SQL and tell that old rascal you want to explore other ideas and code to solve problems.”
Sure, there are teams here and there who are heavily invested in DBT and can overcome such issues as testing on a pure SQL stack, but those are the exceptions. You should break up with SQL because it’s holding you back, let's explore this idea a little more.
How SQL is holding you back.
It was a hard pillow for me to swallow at first, but I didn’t finally come to the realization that writing on SQL was going to hold me back, in the long run, it can only get you so far.
“If you’re writing 80%+ SQL all day long very day, with a smattering of Python here and there, you might be working yourself into a corner that’s hard to back out of.”
You don’t want to find yourself in the spot, whether by your own choice, or not, where you’re in the need of a new job, and you have to cut your prospects in half because you know all you are good for is some SQL.
The rise of Machine Learning and Data Science requires more programming experience.
The popularity of Databricks in Spark has increased the demand for PySpark and Scala.
The rising usage of Streaming pipelines requires more programming experience.
Most startups require broad tech stack experience, DevOps, CI/CD, and cloud services … via code.
The less you code, the rustier you’re going to be unless you're at the tail end of your career.
All I’m saying is that too much focus on anything in life can come back to bite you. Think about becoming a better programmer, with more than one language, as an insurance policy, an investment in yourself.
Not breaking up with SQL, and not being honest with yourself that you’re in a comfortable spot, too comfortable, is a bad place to be. Don’t let that monstrous blog of SQL slowly and quietly slide and slip over you, sucking you down into the depths, where you raise your head up 3 years later and wonder where the rest of the Data Engineering world went.
Broadening your horizons.
It’s time to spread your wings a little, move past SQL, put on your resume, polish it proudly, and move on to the next thing for mastery. Ensuring you are good at more than just SQL is key to your success and sticking out from the crowd.
Let me give you a few practical ideas to help guide you on your path past SQL.
Learn Docker and Docker-compose.
Learn more about Data Modeling and partitions.
Learn about functional programming.
Learn about idempotency in data pipelines.
Learn your way around Linux bash and servers.
That should keep you busy for a little bit anyways. On top of learning some valuable new skills, think about how good those techs will look on your resume, much more marketable and will put you at the top of the stack of candidates.
Importance of the right tool at the right time.
Honestly, at this point, you might think I’m a downer on SQL, but I assure you that is not the case. I spent years and years of my life mastering SQL, and it comes into use nearly on weekly basis. SQL is everywhere, even with SparkSQL, being able to solve simple problems quickly and fiercely with SQL is a wonderful skill.
“Lean SQL, learn it well, but then move on.”
Where I get down on SQL is when the Data Stack has no variety and is pure SQL from end to end, I get a little skeptical. Either it’s a boring job that I don’t want to do, or someone has no creativity and is trying to put a screw in the board with a hammer.
I’ve seen the overuse of the following SQL and relational database functionality.
stored procedures
triggers
hundreds of lines of SQL for a single query
array, JSON, and other data parsing.
blob and text storage.
document storage
I’m not saying you can never do these things with SQL, I’m saying there is more to life than SQL, and a lot of good tooling has been developed to deal with certain issues more efficiently.
Part of being a good Data Engineer is touching up and down on the whole stack, from the DevOps to the data pipeline. That requires a broad set of skills around the entire architecture. Using the correct tools will more everything run more efficiently, Data Engineerings will be learning and happy, with increased tenures.
Closing Thoughts.
Put out those torches, put down the pitchforks your raving-mad peasants. Sure, I might have come off a little strong in the beginning, but you have to admit I’m right. Everyone is obsessed with SQL, as they well should be, but you need to plug your ears and not listen to the siren song.
Push yourself to learn more, and break away from SQL to expand your horizons. Don’t become that old, haggard, and grumpy Database Administrator sitting in the back room writing the same TSQL for the next 20 years. You can do better. You can do more.
The two main problems I see in data engineering regarding SQL: using SQL for everything, and avoiding SQL even when it’s the best solution, usually because it’s not a ‘real language’.
This article was superb. Why wasn't Python listed in the language set? Seems like with databricks, airflow and Snowflake Python would be THE language to learn besides SQL?