I have found that DuckDB still has a few issues that can cause it to blow up unexpectedly - such as loading a dataframe that is the same number of rows as the type sampling process, and columns with all null/None.
Polars has never failed me. And it is the fastest option for the processing I do. Around half the time as other approaches (pandas, arrow, duckdb) for the same tasks.
For bonus points, try ibis with polars as the backend rather than Duckdb. Bliss!
The ibis syntax is also much cleaner - closer to dplyr.
Good post! I will try polars for my master's degree project. I am currently using spark, but it is just for one large dataset. Maybe polars is a better choice.
Unsure if this "means" anything, but I happened to notice that as of Mar 17, 2025, DuckDB seems to have recently passed Polars in number of downloads per day and per week.
I have found that DuckDB still has a few issues that can cause it to blow up unexpectedly - such as loading a dataframe that is the same number of rows as the type sampling process, and columns with all null/None.
Polars has never failed me. And it is the fastest option for the processing I do. Around half the time as other approaches (pandas, arrow, duckdb) for the same tasks.
For bonus points, try ibis with polars as the backend rather than Duckdb. Bliss!
The ibis syntax is also much cleaner - closer to dplyr.
Good post! I will try polars for my master's degree project. I am currently using spark, but it is just for one large dataset. Maybe polars is a better choice.
Unsure if this "means" anything, but I happened to notice that as of Mar 17, 2025, DuckDB seems to have recently passed Polars in number of downloads per day and per week.
DuckDB, https://pypistats.org/packages/duckdb
DuckDB Downloads last day: 325,656
DuckDB Downloads last week: 2,986,434
Polars, https://pypistats.org/packages/polars
Polars Downloads last day: 200,920
Polars Downloads last week: 2,411,958
Chart, Downloads in past 6 Months, https://piptrends.com/compare/duckdb-vs-polars#githubStatistics