Metadata
- Author: Dave Connors
- Full Title:: Surrogate Keys in Dbt: Integers or Hashes?
- Category:: 🗞️Articles
- Document Tags:: sql, Surrogate keys,
- URL:: https://docs.getdbt.com/blog/managing-surrogate-keys
- Finished date:: 2023-08-14
Highlights
For my money, the simplicity of using hashed keys far outweighs the potential benefits of having MIISKs in your data model. Building with dbt works best when all parts of your project are idempotent, and hashed keys require close to zero maintenance. The cost of time spent rebuilding your surrogate keys in your data models if you can’t recreate them with a simple
dbt run
usually offsets any modest performance and storage gains you might be able to achieve with MIISKs. (View Highlight)
The analytical warehouses we use now no longer have the same constraints that traditional warehouses had — joins on strings aren’t notably less performant than those on integers, and storing slightly larger values in the surrogate key column is peanuts given the relative cost of storage on these platforms. This strategy also removes the need for tight coupling of transformations to propagate the surrogate key values across our project — anywhere the inputs for the surrogate keys are present, the hashing function produces the same keys, so we can take advantage of parallel processing in our warehouse and avoid the bottlenecks we had before.