AWS Lambda + DuckDB (and Delta Lake)

Daniel Beach

Dec 29, 2024

simplicity in Data Engineering

Read →

6 Comments

High Performance DE Newsletter

Dec 29, 2024

Good stuff…nice and clean and “real time baby!”

Expand full comment

Alex A

Jan 2

The final screenshot that indicates reading and looking at the table in code. Is that also via Daft? What have you found the best way to manage querying and rolling back to versions of delta tables?

Expand full comment

Reply (1)

Daniel Beach

Jan 8

Yes via Daft. I manage and use Delta Lake tables that are in the 300TB+ range, multiple; I've never had to roll back versions, or go hardcore up front on the data quality, and reliable pipelines to avoid having to do such things. Otherwise use MERGE statements to to create idempotent pipelines.

Expand full comment

Reply (1)

Alex A

Jan 8

I have limited experience with Delta Lake tables, certainly not working with that scale of 300TB , but compared to raw storage, isn't one of the advertised benefits the ability to have history? What would be the use of history if not to query or restore to an earlier version ?

Expand full comment

Marcus Rosen

Jun 30

Great read, but you don't need to (and absolutely should not) embedd AWS credentials into your container at build time, if you need credentials at that time use Docker Secret instead.

All AWS services will automatically load credentials into your container provided you attach an IAM role. If running locally you can inject credentials at run-time via a .env file

Expand full comment

Bruno Jander Santos Lima

Dec 30, 2024

Awesome article! I could better understand how to work with a image through your article!

Expand full comment