In a recent project I was fortunate enough to work in a team that brought a machine learning model into production

Project Intro

  • got to production quickly (3 months)
  • got to production!!!

Key points

  • data gets bigger
    • hosting model API, single category, loading with pandas broke k8
  • networking (infrastructure???) is important
    • we had a model and a way to host but getting its setup took ages
    • standardisation could help here
  • the diagram is true (a lot more engineering than data science)
  • data scientists can learn to l̶o̶v̶e̶ appreciate unit testing (and good code structure)
    • data scientists are smart
    • they ask pointed questions and don’t just accept “because we do it this way”
  • reporting/analysis is important
    • data analyst joining project had big impact on positive exposure
  • keeping track of data quality can be done by data scientists (and other things) - engineers can trust
  • base lines!!! (hackathon where determining a base line with heuristics led to a better results)
  • usual team stuff
    • retro’s can also be about emotions (especially in times of pandemic crisis)
    • knowledge exchange - mini, no prep, half hour - are great
    • Teams can be useful
      • PR
      • Pipeline
  • keep it simple stupid (KISS) is also relevant for DS projects
    • simple tooling
    • reduce how much knowledge is required
    • or “it doesn’t have to be fancy” - just Databricks was fine, just container for model was okay
  • POC vs development
    • this can be done in parallel