Metadata

Highlights

  • SQL and Python are the only languages in the repository.
  • Metaflow, our open-source framework which (among other things) lowers the barrier to entry for data scientists to take machine learning from prototype to production
  • Training happens on AWS Batch, leveraging the abstractions provided by Metaflow;
  • Serving is on SageMaker, leveraging the PaaS offering by AWS;
  • Scheduling is on AWS Step Functions,
  • second, we push down to Snowflake all the (distributed) computing, making the Python side of things pretty lightweight, as we are not really bound by machine memory to aggregate and filter data rows.
  • This code shows that since data is already transformed, a simple query is all you need to get your dataset ready for downstream modeling.