6 Comments
User's avatar
Matt Martin's avatar

Good stuff…nice and clean and “real time baby!”

Expand full comment
Alex A's avatar

The final screenshot that indicates reading and looking at the table in code. Is that also via Daft? What have you found the best way to manage querying and rolling back to versions of delta tables?

Expand full comment
Daniel Beach's avatar

Yes via Daft. I manage and use Delta Lake tables that are in the 300TB+ range, multiple; I've never had to roll back versions, or go hardcore up front on the data quality, and reliable pipelines to avoid having to do such things. Otherwise use MERGE statements to to create idempotent pipelines.

Expand full comment
Alex A's avatar

I have limited experience with Delta Lake tables, certainly not working with that scale of 300TB , but compared to raw storage, isn't one of the advertised benefits the ability to have history? What would be the use of history if not to query or restore to an earlier version ?

Expand full comment
Marcus Rosen's avatar

Great read, but you don't need to (and absolutely should not) embedd AWS credentials into your container at build time, if you need credentials at that time use Docker Secret instead.

All AWS services will automatically load credentials into your container provided you attach an IAM role. If running locally you can inject credentials at run-time via a .env file

Expand full comment
Bruno Jander Santos Lima's avatar

Awesome article! I could better understand how to work with a image through your article!

Expand full comment