Introduction to Daft ( ... vs Polars)

Daniel Beach

Jun 17, 2024

Distributed DataFrames for Python (built with Rust)

Read →

5 Comments

Kevin O'Halloran

Aug 29

in your aggregate test, it does not appear that the sort for Daft is working.

Expand full comment

Matt Martin

Jun 17, 2024

The fact that daft out of the box can read from s3 without the extra hoops to jump through is a win. Polars and duckdb need to get with the times and make s3 and gcs first class directories

Expand full comment

Sammy Sidhu

Jun 20, 2024

We should have made it easier to find but heres the page in the docs to go distributed.

https://www.getdaft.io/projects/docs/en/latest/user_guide/poweruser/scaling-up.html

`daft.context.set_runner_ray()` by default with no arguments, it will spin up a ray cluster locally and then submit work to it enable out-of-core processing.

If you either set an address (or submit a job via the ray jobs api) it will run on a remote cluster.

Expand full comment

Reply (1)

Kevin O'Halloran

Aug 29

that link does not resolve

Expand full comment

Reply (1)

Sammy Sidhu

Aug 29

Ah yes, we renamed the title to make it easier to find.

https://www.getdaft.io/projects/docs/en/latest/user_guide/poweruser/distributed-computing.html

Expand full comment