Discussion about this post

User's avatar
Abby Walker's avatar

Well written post, especially calling out the importance of considering the ‘how’ and ‘when’ transformations should be implemented. It hit the nail on the head for me, and made me chuckle (so perhaps I didn’t cry at recalling past pain debugging logic “hidden” in stored procs)!

Expand full comment
smoortema's avatar

As someone who is used to warehouses running on SQL Server, with SSIS and stored procedures, I am wondering how does the alternative look like? I guess the Databricks alternative is a Python notebook - how is it better? Business logic is still stored in a code, why would a Python code would provide better visibility than an SQL code? Both a stored procedure and a notebook can be version-controlled, so I do not see a difference here either. There is no GIT integration for stored procedures in Databricks yet, but code can be still pushed into a GIT repository, and GIT integration could be also implemented in the future in Databricks. For testing, I can imagine improvements to what is possible with a SQL stored procedure, but I would not say that a stored procedure is hard to test either. I guess it also depends on the kind of stored procedure in question. Instead of CTEs or nested queries, we often use temporary tables in our codes, which lets us check data for intermediary steps, the same way as a notebook allows you to.

Probably I am just not familiar enough with more modern approaches, so I would appreciate if you could point me in the right direction where I can find the answers. (I have read quite a lot of Databricks documentation, and it did not help.)

Expand full comment
2 more comments...

No posts