Discussion about this post

User's avatar
Franco's avatar

Great read! I also try to get out of the PySpark train when possible, specially when data can easily in RAM.

Only comment for me is that in this particular case, I'd go with Polars! I love DuckDB, but what I love even more is integration tests. The way DuckDB manages the connection to AWS doesn't allow me to use Moto3 for mocking the AWS services, which is painful.

Expand full comment
1 more comment...

No posts