Careful young Padawan…I was chewed out for a recent post on querying a 1TB dataset with the duck because I didn’t give some long-winded exact title such as “I worked with a 1TB dataset in duckdb, and pruned the columns to a subset that is common for analytical queries and it ran in under 30 seconds”…doesn’t quite roll off the tongue 🤣
Don’t tread too quickly, youngling. Benchmarking is a serious business; one who speaks without precision is no better than a foolish Padawan seduced by the dark side of social clicks.
Exactly. To me, these kind of blogs should be giving people some kind of insight into what they can expect if they work with a given system. It could have said "I worked with 30GB of columnar data from a 1TB dataset in duckdb...". That is precise and not "long-winded." Some systems that people use don't do column pruning so this is an opportunity to talk about it and say you like your preferred system because it does do column pruning. That also gives people some insight into processing time relative to the true data size and that helps them extrapolate for their use cases.
Careful young Padawan…I was chewed out for a recent post on querying a 1TB dataset with the duck because I didn’t give some long-winded exact title such as “I worked with a 1TB dataset in duckdb, and pruned the columns to a subset that is common for analytical queries and it ran in under 30 seconds”…doesn’t quite roll off the tongue 🤣
Haters are gonna hate
Anyone questioning why you didn't run a query on the entire dataset needs to be asked to leave the room.
Don’t tread too quickly, youngling. Benchmarking is a serious business; one who speaks without precision is no better than a foolish Padawan seduced by the dark side of social clicks.
Exactly. To me, these kind of blogs should be giving people some kind of insight into what they can expect if they work with a given system. It could have said "I worked with 30GB of columnar data from a 1TB dataset in duckdb...". That is precise and not "long-winded." Some systems that people use don't do column pruning so this is an opportunity to talk about it and say you like your preferred system because it does do column pruning. That also gives people some insight into processing time relative to the true data size and that helps them extrapolate for their use cases.
irlchortling at “milk toast”