Machine Learning System Design is one of my favorite aspects of Machine Learning. We start with a business idea, a product ideation to deduce a whole set of technical requirements to build that product. I want to give you my framework, my playbook to design ML solutions from a business problem.
I am going to use a specific example to illustrate this playbook: how to build the friend suggestion feature on Facebook. We are going to design the different components and look at how this can be framed as a machine-learning solution. Let's get into it.
Watch the video for the full content:
Machine Learning system design is about translating the business requirements into technical requirements to build a machine learning solution.
Here is the playbook to construct a useful design:
What is the Machine Learning problem?
What are the business metrics?
What are the online metrics?
What are the architectural components?
How do we get the training data?
What are the offline metrics?
What are the features?
What is the model?
Let’s consider the Suggest Facebook Friends on Facebook.
Based on the problem, this is very likely a classification problem.
We need to be able to translate the business requirements into KPIs we can measure.
We need to choose the right online metrics.
As much as possible, we need to find online metrics that correlate to the KPIs.
It is often difficult to choose the same metrics in offline experiments as in online experiments.
Typically, we use metrics that capture the binary aspect of the data.
Again, it is important to choose offline metrics that correlate well with online metrics.
There are a lot of questions we need to ask ourselves when it comes to the architectural components.
Does that need to be real-time?
Could be batched:
Different strategies for different users based on activity
We may not need something faster than hourly
We need to select user candidates: candidate selection
We need to filter (block users, privacy policies, …)
We need to rank
We need to add diversity (not always the same users being presented)
In general, the problem can be broken down into specialized sub-modules.
So, how could we implement it for real-time inference?
What about batch?
What about streaming?
There are a lot of questions we need to answer when it comes to the training data:
Where does come from the training data? The feedback loop from user interaction?
What about initially when no training data exists? Random?
What is the instance target? Clicked or not?
Should we choose all the instances presented?
Should the training data be different for different devices?
Do we need a maximum window?
To understand what the minimum set of features to be used in the models should be, it is important to understand the different actors at play.
The actual model we are going to choose is the last point and least important aspect of the design.
SPONSOR US
Get your product in front of more than 65,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - tens of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
To ensure your ad reaches this influential audience, reserve your space now by emailing damienb@theaiedge.io.
It would be sick to have like a small series/playlist (later turned in a course like the langchain course) for premium subscribers about building a complete system from scratch explained for dummies - would really appreciate like a hands-on practical-only mini course about it!
Also, I always read your content and I really enjoy, I'm still sad I couldn't afford to participate in the ML bootcamp.
Keep up the great work!!