Guilty as charged. When creating new models/exploring new modeling paradigms/doing iterative and in-depth experimentation/explaining complex topics I simply have not found a better alternative. They're rich, portable, and are implicitly designed to support iterative workflows while keeping data in memory which saves me, probably, a billion hours a day.
new problem -> a series of notebooks as I explore the problem -> the final notebook where I solve the problem -> save artifacts -> test artifacts -> build production code around artifacts.
Modern version control systems and coding environments are not designed for the complex and experimentative work that usually represents the impetus of a data science project. That's what notebooks are for. But of course the notebooks themselves never run in a production setting, or at least I hope to god they don't.
(This is all for data science work. For data engineering I'm not really sure why they're so popular)
I agree with all of this. My workflow mirrors this approach, though I work in civil/structural engineering, which is obviously quite different from Data Engineering. I use notebooks to explore, solve and document problems in my projects. Once I reach a point where I'm ready to deploy a solution, I'll transition out of a notebook to implement it. The idea of using a notebook in any production setting strikes me as highly impractical.
Guilty as charged. When creating new models/exploring new modeling paradigms/doing iterative and in-depth experimentation/explaining complex topics I simply have not found a better alternative. They're rich, portable, and are implicitly designed to support iterative workflows while keeping data in memory which saves me, probably, a billion hours a day.
new problem -> a series of notebooks as I explore the problem -> the final notebook where I solve the problem -> save artifacts -> test artifacts -> build production code around artifacts.
Modern version control systems and coding environments are not designed for the complex and experimentative work that usually represents the impetus of a data science project. That's what notebooks are for. But of course the notebooks themselves never run in a production setting, or at least I hope to god they don't.
(This is all for data science work. For data engineering I'm not really sure why they're so popular)
I agree with all of this. My workflow mirrors this approach, though I work in civil/structural engineering, which is obviously quite different from Data Engineering. I use notebooks to explore, solve and document problems in my projects. Once I reach a point where I'm ready to deploy a solution, I'll transition out of a notebook to implement it. The idea of using a notebook in any production setting strikes me as highly impractical.
For a long time at the start, i too was addicted, no reusablity, became a big pain. Thanks for bringing it all together so nicely.