Dr. Mario's 2nd 🧠

❯

❯

❯

Declarative Scheduling for Data Assets

Declarative Scheduling for Data Assets

2 min read

Metadata

URL:: https://dagster.io/blog/declarative-scheduling
Publisher:: dagster.io
Tags:: Dagster, Workflow Orchestrators

Highlights

You’ve just declared how your data flows and when you expect it to be up-to-date.
A single data source can power dozens of data products, each with different data freshness requirements.
A single data product can consume data from dozens of different sources, each of which is updated at a different cadence. Trying to chop up this graph of data assets into discrete workflows that each get executed synchronously often feels like forcing a square peg into a round hole.
Another friction with imperative workflow-based orchestration is that, every time you add an asset, you have to find a DAG to put it in to get it scheduled. This means you have to worry about whether DAGs are getting too large and unwieldy, on one extreme, or too small and fragmented, on the other. This can intersect with organization frictions - which team’s workflow should a shared task belong to?
Granular versioning
If the versions of the upstream data or code that an asset depends on have changed since the asset was last materialized, then the asset is considered stale.
Versions can also be applied to source assets.
you can supply a function that observes it and reports its version, to use in the determination of what assets are stale.
You can declare that a single asset is composed of multiple partitions,
Dagster can then materialize individual partitions

Webmentions

Loading webmentions...

Unable to load webmentions. Please try again later.

❤️ Likes

🔄 Reposts

💬 Replies

🔗 Mentions

No webmentions found for this post yet. Be the first to mention it!

Graph View

Metadata
Highlights

Backlinks

Down with the Dag

Created with Quartz v4.5.1 © 2025