Metadata

Highlights

  • You’ve just declared how your data flows and when you expect it to be up-to-date.
  • A single data source can power dozens of data products, each with different data freshness requirements.
  • A single data product can consume data from dozens of different sources, each of which is updated at a different cadence. Trying to chop up this graph of data assets into discrete workflows that each get executed synchronously often feels like forcing a square peg into a round hole.
  • Another friction with imperative workflow-based orchestration is that, every time you add an asset, you have to find a DAG to put it in to get it scheduled. This means you have to worry about whether DAGs are getting too large and unwieldy, on one extreme, or too small and fragmented, on the other. This can intersect with organization frictions - which team’s workflow should a shared task belong to?
  • Granular versioning
  • If the versions of the upstream data or code that an asset depends on have changed since the asset was last materialized, then the asset is considered stale.
  • Versions can also be applied to source assets.
  • you can supply a function that observes it and reports its version, to use in the determination of what assets are stale.
  • You can declare that a single asset is composed of multiple partitions,
  • Dagster can then materialize individual partitions