To discuss

Valery work during May

  • During May, Anis will be out on holiday, so it’s just Valery and me as DEs, so it makes sense to have a larger scope to:

Data engineering final/ideal vision

  • Regarding data engineers, as any other software engineering role:
    • Embedded in multidisciplinar squads, working in bounded contexts.
    • Managed by the manager of each team.
    • Depending on our needs at that point, and again as with regular software engineering, we may have some way to maintain/develop common tooling, cross-domain projects…: a “data platform team”, and/or data architects, and/or tech sprints for data.
    • Ya veremos quĂ© pasa conmigo ahĂ­.
  • Including data in Domain Driven Design: your data is also an API of the team and same things apply there (ownership, contracts, shared kernels…) → making a team responsible of the data they produce, awareness that they cannot simply break formats (because of downstream dependencies), better modeling.
  • DEs en que Lab!
  • DE

How to reach there from where we are

You cannot start from an embedded model:

  • Your data engineering needs (firefighting and new features) are probably scattered everywhere and you don’t have a DE for each team (and given your current identified work, it may not even make sense or may change quickly): you need the flexibility of central prioritization.
  • A single embedded member and a single centralized member are not enough on each of the areas. Either you live with a bus factor of 1 on each area or you force them to be aware of what the other is doing, essentially acting as a team but forcing them to a split brain on their contexts (the opposite of optimizing a team flow).
  • A con of the embedded model is a focus on short-term and domain-specific needs, but you have common data debt that no team has a real incentive to prioritize, but would improve the productivity of all teams:
    • Poor monitoring on the whole pipeline jungle, making problems hard to detect and restore, outside of the realm of a single team (E.g., https://www.notion.so/seedtag/Postmortem-contextualization-data-a32c0ce690614876b243a36102428dd3)
    • No orchestration and no data discoverability/lineage tooling: inability to track downstream dependencies, duplicated data…
    • No data modelling at all: duplicated logic in different pipelines, a sprawl of high coupled ad-hoc transformations. A central team would give you the flexibility to move with your workload (firefighting and feature work) while combining it with a more strategic view and tackling debt. As you go on, you can organically realize you need more DEs if you are unable to fulfill the needs of the teams AND the central debt, and you can start embedding those on the teams with the most load. However, even in that case, until your data engineering practices mature and collaboration with the central team is not frequent, you will likely want centralized management.

Valery work from May until we get another DE / end of year

  • Given the above: I suggest to move to the central team, and also keep me as a manager (which also helps me set the culture).

Once we have another DE

  • We will talk… but I I still don’t see the point of a single DE’t see the point of a single DE in a team.

Transclude of Data-team-topologies#^55568d