Many companies are exploring utilizing machine learning (ML) technology, however, getting a model deployed and into production is no easy feat. Many data scientists lack the necessary tooling to properly train, manage, and deploy ML models.
Machine Learning Operations, or MLOps, is commonly viewed as the application of DevOps principles to machine learning systems which makes subsequent collaboration between data scientists and data engineers much easier. By employing MLOps, engineers can automate many elements of the model development lifecycle.
As we think about the MLOps landscape, we see 5 broad categories:
- Data Pipeline
- Modeling and Evaluation
SETUP/DATA PIPELINE: We believe the setup and data pipeline stages are adequately supported by an extensive list of existing solutions in the market. Training and modeling are largely based on frameworks and algorithms – think PyTorch, Sagemaker, Tensorflow, and Sci-Kit Learn. Ultimately, we believe the most underdeveloped portions of the pipeline are in deployment and monitoring. This is where the current opportunity lies in MLOps.
MODELING: Models can be deployed to support real-time or batch predictions. As expected, deploying models for real-time prediction is considerably more difficult than batch (i.e. offline) given that the result is needed instantaneously. This is where processing time becomes an issue. One common challenge for real-time prediction is when there’s a request the model can’t completely answer with available data. In this case, it must find the data from other databases that may not be optimized for querying single records.
DEPLOYMENT: Teams will need to precompute and store various features needed for real-time models to be read at prediction time. Maintaining, monitoring, and administering this infrastructure is cumbersome and essentially requires an entire engineering team to manage. We think there’s an opportunity to automate many of these functions.
MONITORING: Done after implementation, monitoring enables engineers to know when they need to retrain models due to model drift (also referred to as concept drift). Drift occurs because, upon deployment, new/real-world data is ingested into the model which can lead to degradation in predictive performance.
Many monitoring processes, including tracking metrics around precision and accuracy, can be time-consuming and error-prone. Few organizations currently have the infrastructure to monitor their key metrics and identify when the models need to be retrained. While current strategies primarily center around retraining the models on a periodic basis and are sufficient for many, there’s a massive opportunity for a solution that automatically detects then reacts to model drift.
If you’re building a company focused on deployment or monitoring for ML, we’d love to chat with you! You can reach out to me at firstname.lastname@example.org.