5 Comments

Cool how you outlined your thinking and your experiment steps!

Expand full comment

Would you mind trying this Rust code instead? Although a bit less naive than your first approach, it avoids a lot of the unnecessary allocations existing in your code

fn main() {

let before = Instant::now();

let source_file = File::open("divvy-tripdata.csv").expect("can't open file");

let mut reader = BufReader::new(source_file);

let destination_file = File::create("divvy-biketrip.tsv").expect("problem with file");

let mut writer = BufWriter::new(destination_file);

let mut buffer = String::new();

loop {

buffer.clear();

let read_count = reader.read_line(&mut buffer).expect("something went wrong reading the file");

if read_count == 0 {

break;

}

let replaced = buffer.replace(",", "\t");

writer.write(replaced.as_bytes()).expect("problem writing lines");

}

println!("Elapsed time: {:.2?}", before.elapsed());

}

Expand full comment

Love the code!

Expand full comment

Rayon is insanely easy to use to add parallelization to your code, however your time measuring here is unfair as in all cases you're measuring the time that takes to read & write from disk, which is the biggest overhead. A more fair comparison of the impact due to the addition of Rayon would be to measure after you've loaded the data into memory and before dumping it into the file.

Expand full comment

some random dude. nice. I like it.

Expand full comment