On just using composite keys vs hashing on top

Hash Keys in the Data Vault – Scalefree Architecture does a good job on explaining what you lose if you just use composite keys:

  • You may have very long composite keys (so the hash sort of compress the info).
  • Some DBs seemed not to be efficient on joins with variable length string columns.

While using the business keys in Data Vault might be an option, it is actually a slow one, using a lot of storage (even more than hash keys). Especially in links and their dependent satellites, many composite business keys are required to identify the relationship or transaction / event in a link – and to describe it in the satellite. This would require a lot of storage and slow down the loading process because not all database engines have the capability to execute efficient joins on variable length business keys. On the other hand we would have too many columns in the link, because every business key must be a part of the link. The issue at this point is that we also have different data types with different lengths in the links. This issue is exaggerated because it is also required to replicate the business keys into their satellites. To guarantee a consistent join performance, the solution is to combine the business keys into a single column value by using hash functions to calculate a unique representation of a business object.