5 Comments
User's avatar
Nick E.'s avatar

Cool how you outlined your thinking and your experiment steps!

Expand full comment
Marc Enriquez's avatar

Would you mind trying this Rust code instead? Although a bit less naive than your first approach, it avoids a lot of the unnecessary allocations existing in your code

fn main() {

let before = Instant::now();

let source_file = File::open("divvy-tripdata.csv").expect("can't open file");

let mut reader = BufReader::new(source_file);

let destination_file = File::create("divvy-biketrip.tsv").expect("problem with file");

let mut writer = BufWriter::new(destination_file);

let mut buffer = String::new();

loop {

buffer.clear();

let read_count = reader.read_line(&mut buffer).expect("something went wrong reading the file");

if read_count == 0 {

break;

}

let replaced = buffer.replace(",", "\t");

writer.write(replaced.as_bytes()).expect("problem writing lines");

}

println!("Elapsed time: {:.2?}", before.elapsed());

}

Expand full comment
Daniel Beach's avatar

Love the code!

Expand full comment
Some random dude's avatar

Rayon is insanely easy to use to add parallelization to your code, however your time measuring here is unfair as in all cases you're measuring the time that takes to read & write from disk, which is the biggest overhead. A more fair comparison of the impact due to the addition of Rayon would be to measure after you've loaded the data into memory and before dumping it into the file.

Expand full comment
Daniel Beach's avatar

some random dude. nice. I like it.

Expand full comment