DS repository structures and frameworks

- Tags:: #📝CuratedNotes , [[Data methodology|Data Methodology]], [[scalable-computing-of-features|Scalable Computing Of Features]] ## Favorite refs My favorite so far: https://towardsdatascience.com/how-to-structure-a-data-science-project-for-readability-and-transparency-360c6716800 On repository structure, we have the typical: - [Home - Cookiecutter Data Science](https://drivendata.github.io/cookiecutter-data-science/?utm_source=pocket_mylist) But there are others such as: - [dslp/dslp: The Data Science Lifecycle Process is a process for taking data science teams from Idea to Value repeatedly and sustainably. The process is documented in this repo. (github.com)](https://github.com/dslp/dslp) Close to this, going up in complexity (based on DAGs) - [Kedro spaceflights tutorial — Kedro 0.17.7 documentation](https://kedro.readthedocs.io/en/stable/03_tutorial/01_spaceflights_tutorial.html) - [Hamilton: Scaling to Match your Data! | Stitch Fix Technology – Multithreaded](https://multithreaded.stitchfix.com/blog/2022/02/22/scaling-hamilton/) A full framework: [Why Metaflow | Metaflow Docs](https://docs.metaflow.org/introduction/why-metaflow) Also, a good summary on how to share features: [How Machine Learning Teams Share and Reuse Features - Tecton](https://www.tecton.ai/blog/how-machine-learning-teams-share-and-reuse-features/)