DuckDB for Data Engineering

Aug 14, 2023

Analytics or Pipeline?

1 Comment

Aug 16, 2023

I think dbt is actually the answer to the issue of SQL modularity and testing. The project you refer to just isn’t following best practices: https://docs.getdbt.com/guides/best-practices/how-we-structure/2-staging

That said, I personally aim for as much SQL for the basics, then do the rest with whatever tools the team chooses (probably Python). The reason is that while SQL can be a painful monstrosity to look at, it’s declarative and that means that you get performance boosts any time you upgrade your database without needing to update any SQL (yes there are regressions but after those get fixed).

The problem with maintaining your own code is that it doesn’t get better for free and you’re relying on your team to make sure it works and as business logic changes, you don’t miss any assumptions in the imperative logic many hands have touched. Sure that’s true of databases, but multiple companies have collaborated and run these systems in production, they’ll run into and resolve issues for you faster than you’ll encounter them much less fix them.

It’s like owning your own house versus renting. Yeah you get to own your own property, but if something breaks, you’re now the landlord that must pay to fix it.

So maximize SQL, make it modular and testable with dbt, and write your own code only when necessary.

Expand full comment