As an an old-school data person and big fan of Kimball - mainly for having amazing success in delivery of data on a repeatable Lego like framework.
That said - way before data lakes and clouds - most MPP Data Warehouse databases did not support primary keys and foreign keys for performance reasons. It allowed you to define them in the DDL for documentation purposes, and even some BI / ETL tools used that metadata to assist the tools in coding. Kimball even included de-duplication checking in within the recommended subsystems for a EDW framework. The more things change, the more things stay the same. ;)
The question is then ... was there an issue to solve in the first place? Data Lakes don't solve this problem, at least not yet. The user has to manually solve it.
As an an old-school data person and big fan of Kimball - mainly for having amazing success in delivery of data on a repeatable Lego like framework.
That said - way before data lakes and clouds - most MPP Data Warehouse databases did not support primary keys and foreign keys for performance reasons. It allowed you to define them in the DDL for documentation purposes, and even some BI / ETL tools used that metadata to assist the tools in coding. Kimball even included de-duplication checking in within the recommended subsystems for a EDW framework. The more things change, the more things stay the same. ;)
I miss real real databases. And I do love the Kimball (and Ross, and Adamson and Reeves).
"move on with life"
But what if I can't?! Hashs are far too clever for me at the moment. It's good to know Data Lakes have things covered though.
<Data Vault has entered the chat>
But it doesn't explain how this data lake solve this issue?
The question is then ... was there an issue to solve in the first place? Data Lakes don't solve this problem, at least not yet. The user has to manually solve it.