The AiEdge Newsletter

Share this post

MLOps 101: Feature Stores, Automation, Testing and Monitoring

newsletter.theaiedge.io
Newsletters

MLOps 101: Feature Stores, Automation, Testing and Monitoring

The fastest emerging field in Machine Learning

Damien Benveniste
Mar 1
8
Share this post

MLOps 101: Feature Stores, Automation, Testing and Monitoring

newsletter.theaiedge.io

Today we dig in the fastest emerging field in Machine Learning: MLOps! This is potentially the best career to choose in ML those days but not enough is being written about it in my opinion. We are going to look at:

  • The Feature Store: The Data Throne

  • Continuous Integration - Deployment - Training (CI / CD / CT) for ML

  • Testing and monitoring in ML


Each week, I try to provide my take on some Machine Learning subject. Subscribe to get each and every issue.

In my opinion, MLOps is one the greatest innovations in Machine Learning of the past 10 years! Granted MLOps has actually been around for quite some time, but the processes have become more formalized, the best practices are really spreading across companies and industries in an accelerated manner that has never been seen before. The tools to monitor and automate are becoming much more common. My guess is that becoming a MLOps expert now is one of the best career bet for the years to come!

The Feature Store: The Data Throne

In Machine Learning, the data is king, and the Feature Store is its throne! Do you remember the time when each team was building their own data pipelines, when the data in production was not consistent with the one in development, and when we had no idea how half of the features were created? Those were dark times prior to the era of feature stores! To be fair, not everybody should invest in Feature Stores: if you don't need real-time inference, if you have less than ~10 models in production, or if you don't need frequent retraining, a Feature Store may not be for you.

As far as I know, Uber’s Michelangelo is the first ML platform to introduce a feature store. A feature store is exactly what it sounds like! This is a place where Data Scientists / Machine Learning engineers of different teams can browse features for their next ML development endeavor. You can rely on the quality of the data and consistency is ensured between development and production pipelines. The features originate from streaming and batch sources of data and their computations are centralized and version controlled. Typically, a feature store provides monitoring capability for concept and data drift, a registry to discover features and their metadata, offline storage for model training and batch scoring, and an online store API for real-time applications.  

Let's list some of the advantages of feature stores: 

  • The ability to share features between teams and projects

  • The ability to ensure consistency between training and serving pipelines

  • The ability to serve features at low latency

  • The ability to query the features at different points in time: features evolve, so we need a guarantee on the point-in-time correctness.

  • Ability to monitor features even before they are used in production

  • Provide feature governance with different levels of access control and versioning

There are tons of vendors available! Feast is an open source project from Google Cloud and Go-Jek and it integrates with KubeFlow. AWS has its own feature store as part of SageMaker. ScribbleData, Tecton, and Hopsworks provide feature stores as well and other MLOps capabilities. 

You can read more about it here: 

  • MLOps with a Feature Store

  • Feature Store as a Foundation for Machine Learning

  • The Essential Architectures For Every Data Scientist and Big Data Engineer

Continuous Integration - Deployment - Training (CI / CD / CT)

If you are working in a big tech company on ML projects, chances are you are working on some version of Continuous Integration / Continuous Deployment (CI/CD). It represents a high level of maturity in MLOps with Continuous Training (CT) at the top. This level of automation really helps ML engineers to solely focus on experimenting with new ideas while delegating repetitive tasks to engineering pipelines and minimizing human errors.

On a side note, when I was working at Meta, the level of automation was of the highest degree. That was simultaneously fascinating and quite frustrating! I had spent so many years learning how to deal with ML deployment and management that I had learned to like it. I was becoming good at it, and suddenly all that work seemed meaningless as it was abstracted away in some automation. I think this is what many people are feeling when it comes to AutoML: a simple call to a "fit" function seems to replace what took years of work and experience for some people to learn.

There are many ways to implement CI/CD/CT for Machine Learning but here is a typical process:

  • The experimental phase - The ML Engineer wants to test a new idea (let's say a new feature transformation). He modifies the code base to implement the new transformation, trains a model, and validates that the new transformation indeed yields higher performance. The resulting outcome at this point is just a piece of code that needs to be included in the master repo.

  • Continuous integration - The engineer then creates a Pull request (PR) that automatically triggers unit testing (like a typical CI process) but also triggers the instantiation of the automated training pipeline to retrain the model, potentially test it through integration tests or test cases and push it to a model registry. There is a manual process for another engineer to validate the PR and performance reading of the new model.    

  • Continuous deployment - Activating a deployment triggers a canary deployment to make sure the model fits in a serving pipeline and runs an A/B test experiment to test it against the production model. After satisfactory results, we can propose the new model as a replacement for the production one.   

  • Continuous training - as soon as the model enters the model registry, it deteriorates and you might want to activate recurring training right away. For example, each day the model can be further fine-tuned with the new training data of the day, deployed, and the serving pipeline is rerouted to the updated model.

The Google Cloud documentation is a good read on the subject:

  • MLOps: Continuous delivery and automation pipelines in machine learning

  • Architecture for MLOps using TensorFlow Extended, Vertex AI Pipelines, and Cloud Build

Testing and monitoring

I think most Machine Learning engineers and Data Scientists dislike writing unit tests. I am definitely one of them! You develop a model, you architect the deployment strategy, you potentially build new data pipelines, you document everything, and you still have to write unit tests to test that a float is not a string?! 

Testing and monitoring in Machine Learning tend to be quite different from traditional software development. Where typical software requires testing on the code itself, machine learning models require tests validating the code itself, but also the data, model output and the inference pipeline. When it comes to monitoring, beyond the typical latency, memory, CPU and disk utilization, we also need to monitor the incoming data and the model quality.

What is your ML test score? Google established the following guidelines when it comes to testing and monitoring: “The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction“. Ask yourself those questions next time you develop a model:

Testing the data:

  • Are the feature expectations captured in a schema?

  • Are all the features useful?

  • What is the cost of each feature?

  • Are you using features with business restrictions? 

  • Does the data pipeline have appropriate privacy controls?

  • Can new features be added quickly?

  • Are all your features unit tested?

Testing the model:

  • Has the code been reviewed?

  • Do the offline metrics correlate with the online ones?

  • Have you tuned all the hyperparameters?

  • How old / stale is the model?

  • Is your model better than a simpler model?

  • Is the model performance good on all the segments of the data?

  • Is your model fair for all groups of people?

Testing the infra:

  • Is the training reproducible?

  • Is the model specification code unit tested?

  • Do you have integration tests for the whole ML pipeline?

  • Is model validation in place before being served?

  • Is there a simple process to debug training or inference on a simple example?

  • Are you using canary testing before serving the model?

  • Can you easily roll back to a previous production model? 

What should you monitor?

  • Any dependency change

  • If the data changes overtime in the training and serving pipelines

  • If the features are different in training from serving pipelines

  • If the model is stale

  • If the model generates invalid values

  • If the model training speed, serving latency, throughput or RAM usage changes

  • If the model performance changes

Let's play a game. You get 0.5 points for each of the tests and monitoring if done manually, and a full point if it is automated. Sum your points for each of the 4 sections individually and take the minimum among the 4. What is your Testing Score?

If you are finding this newsletter valuable, consider sharing it with friends or subscribing if you haven’t already!

Share this post

MLOps 101: Feature Stores, Automation, Testing and Monitoring

newsletter.theaiedge.io
Previous
Next
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 AiEdge
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing