3 Comments
User's avatar
Nathaniel Ramm's avatar

I have found that DuckDB still has a few issues that can cause it to blow up unexpectedly - such as loading a dataframe that is the same number of rows as the type sampling process, and columns with all null/None.

Polars has never failed me. And it is the fastest option for the processing I do. Around half the time as other approaches (pandas, arrow, duckdb) for the same tasks.

For bonus points, try ibis with polars as the backend rather than Duckdb. Bliss!

The ibis syntax is also much cleaner - closer to dplyr.

Expand full comment
Romario Gomes's avatar

Good post! I will try polars for my master's degree project. I am currently using spark, but it is just for one large dataset. Maybe polars is a better choice.

Expand full comment
Tom Cal's avatar

Unsure if this "means" anything, but I happened to notice that as of Mar 17, 2025, DuckDB seems to have recently passed Polars in number of downloads per day and per week.

DuckDB, https://pypistats.org/packages/duckdb

DuckDB Downloads last day: 325,656

DuckDB Downloads last week: 2,986,434

Polars, https://pypistats.org/packages/polars

Polars Downloads last day: 200,920

Polars Downloads last week: 2,411,958

Chart, Downloads in past 6 Months, https://piptrends.com/compare/duckdb-vs-polars#githubStatistics

Expand full comment