2 Comments

The fact that daft out of the box can read from s3 without the extra hoops to jump through is a win. Polars and duckdb need to get with the times and make s3 and gcs first class directories

Expand full comment

We should have made it easier to find but heres the page in the docs to go distributed.

https://www.getdaft.io/projects/docs/en/latest/user_guide/poweruser/scaling-up.html

`daft.context.set_runner_ray()` by default with no arguments, it will spin up a ray cluster locally and then submit work to it enable out-of-core processing.

If you either set an address (or submit a job via the ray jobs api) it will run on a remote cluster.

Expand full comment