4 Comments
User's avatar
Abby Walker's avatar

Well written post, especially calling out the importance of considering the ‘how’ and ‘when’ transformations should be implemented. It hit the nail on the head for me, and made me chuckle (so perhaps I didn’t cry at recalling past pain debugging logic “hidden” in stored procs)!

Expand full comment
smoortema's avatar

As someone who is used to warehouses running on SQL Server, with SSIS and stored procedures, I am wondering how does the alternative look like? I guess the Databricks alternative is a Python notebook - how is it better? Business logic is still stored in a code, why would a Python code would provide better visibility than an SQL code? Both a stored procedure and a notebook can be version-controlled, so I do not see a difference here either. There is no GIT integration for stored procedures in Databricks yet, but code can be still pushed into a GIT repository, and GIT integration could be also implemented in the future in Databricks. For testing, I can imagine improvements to what is possible with a SQL stored procedure, but I would not say that a stored procedure is hard to test either. I guess it also depends on the kind of stored procedure in question. Instead of CTEs or nested queries, we often use temporary tables in our codes, which lets us check data for intermediary steps, the same way as a notebook allows you to.

Probably I am just not familiar enough with more modern approaches, so I would appreciate if you could point me in the right direction where I can find the answers. (I have read quite a lot of Databricks documentation, and it did not help.)

Expand full comment
Adrian Pasek's avatar

Try dbt. One of the reasons of its exsistence is obscurity of stored procedures.

Expand full comment
Luigi M.'s avatar

Nice article. Actually I was also suprised when I heard the introduction of SQL Stored procedures on Databricks since they have been doing things "differently" so far. I asked some of their guys what is the purpose of it and mainly it looks like it should be used for migrations from other platforms where you already have Stored Procedures in place and you don't have the time (or are too lazy) to convert the logic to accomodate the Databricks tools.

I am just not sure that you are completely frank, at the end of the article, when you say that you don't have an opinion :)

Expand full comment