The fact that daft out of the box can read from s3 without the extra hoops to jump through is a win. Polars and duckdb need to get with the times and make s3 and gcs first class directories
`daft.context.set_runner_ray()` by default with no arguments, it will spin up a ray cluster locally and then submit work to it enable out-of-core processing.
If you either set an address (or submit a job via the ray jobs api) it will run on a remote cluster.
in your aggregate test, it does not appear that the sort for Daft is working.
The fact that daft out of the box can read from s3 without the extra hoops to jump through is a win. Polars and duckdb need to get with the times and make s3 and gcs first class directories
We should have made it easier to find but heres the page in the docs to go distributed.
https://www.getdaft.io/projects/docs/en/latest/user_guide/poweruser/scaling-up.html
`daft.context.set_runner_ray()` by default with no arguments, it will spin up a ray cluster locally and then submit work to it enable out-of-core processing.
If you either set an address (or submit a job via the ray jobs api) it will run on a remote cluster.
that link does not resolve
Ah yes, we renamed the title to make it easier to find.
https://www.getdaft.io/projects/docs/en/latest/user_guide/poweruser/distributed-computing.html