Top Related Projects
The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.
Machine Learning Toolkit for Kubernetes
🦉 Data Versioning and ML Experiments
Data-Centric Pipelines and Data Versioning
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
Build, Manage and Deploy AI/ML Systems
Quick Overview
The Microsoft/MLOps repository is a comprehensive resource for Machine Learning Operations (MLOps) best practices and implementations. It provides a collection of tools, templates, and guidelines to help data scientists and ML engineers streamline their ML workflows, from development to production deployment, with a focus on Azure Machine Learning.
Pros
- Offers end-to-end MLOps solutions and best practices
- Provides integration with Azure Machine Learning and other Azure services
- Includes templates and examples for various ML scenarios and frameworks
- Regularly updated with new features and improvements
Cons
- Primarily focused on Azure ecosystem, which may limit applicability for non-Azure users
- Some examples and templates may require advanced knowledge of Azure services
- Documentation can be overwhelming for beginners due to the breadth of content
- May require significant setup and configuration for full implementation
Code Examples
This repository is not primarily a code library but rather a collection of resources, templates, and best practices. Therefore, specific code examples are not applicable in the traditional sense. However, the repository does contain various code samples and templates within its different projects and scenarios.
Getting Started
To get started with the Microsoft/MLOps repository:
-
Clone the repository:
git clone https://github.com/microsoft/MLOps.git -
Navigate to the desired scenario or template folder.
-
Follow the README instructions in the chosen folder for specific setup and usage guidelines.
-
For Azure Machine Learning integration, ensure you have an Azure subscription and the necessary permissions to create and manage resources.
-
Install required dependencies, which may vary depending on the specific scenario or template you're working with.
-
Explore the documentation and examples to understand how to implement MLOps practices in your own projects.
Competitor Comparisons
The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.
Pros of MLflow
- Open-source and vendor-neutral, allowing for greater flexibility and community contributions
- Comprehensive experiment tracking and model registry features out-of-the-box
- Supports multiple programming languages and ML frameworks
Cons of MLflow
- Less integrated with cloud services compared to MLOps
- May require more setup and configuration for enterprise-scale deployments
- Limited built-in support for advanced CI/CD pipelines
Code Comparison
MLflow:
import mlflow
mlflow.start_run()
mlflow.log_param("param1", 5)
mlflow.log_metric("accuracy", 0.85)
mlflow.end_run()
MLOps:
from azureml.core import Experiment, Run
run = Run.get_context()
run.log("param1", 5)
run.log("accuracy", 0.85)
Summary
MLflow offers a more flexible, open-source approach to MLOps with strong experiment tracking capabilities. It's suitable for various environments and frameworks but may require more setup for large-scale deployments. MLOps, being Microsoft-centric, provides tighter integration with Azure services and potentially easier enterprise-scale implementation, but with less flexibility across different platforms.
Machine Learning Toolkit for Kubernetes
Pros of Kubeflow
- More comprehensive end-to-end ML platform with a wider range of tools and components
- Better suited for large-scale, distributed machine learning workflows
- Stronger integration with Kubernetes ecosystem and cloud-native technologies
Cons of Kubeflow
- Steeper learning curve and more complex setup process
- Requires more resources and infrastructure to run effectively
- Less focus on Azure-specific integrations and services
Code Comparison
MLOps:
from azureml.core import Workspace, Experiment, Environment, ScriptRunConfig
ws = Workspace.from_config()
experiment = Experiment(workspace=ws, name='my-experiment')
env = Environment.from_conda_specification('my-env', 'environment.yml')
Kubeflow:
from kfp import dsl, compiler
@dsl.pipeline(name='My Pipeline')
def my_pipeline():
preprocess_op = dsl.ContainerOp(
name='Preprocess',
image='preprocess-image:latest',
arguments=['--input', 'data.csv', '--output', 'processed.csv']
)
Both repositories focus on MLOps practices, but Kubeflow offers a more comprehensive platform for end-to-end machine learning workflows, while MLOps is more tailored for Azure-specific integrations. Kubeflow provides better support for large-scale, distributed machine learning but comes with a steeper learning curve. MLOps, on the other hand, offers a more straightforward setup process and tighter integration with Azure services.
🦉 Data Versioning and ML Experiments
Error generating comparison
Data-Centric Pipelines and Data Versioning
Pros of Pachyderm
- Provides version control for data, enabling reproducibility and data lineage tracking
- Offers a scalable, containerized data pipeline system for complex workflows
- Supports language-agnostic data processing with built-in parallelization
Cons of Pachyderm
- Steeper learning curve compared to MLOps' more straightforward approach
- Requires more infrastructure setup and management
- May be overkill for smaller projects or teams with simpler ML workflows
Code Comparison
MLOps example (Azure ML SDK):
from azureml.core import Workspace, Experiment, ScriptRunConfig
ws = Workspace.from_config()
experiment = Experiment(workspace=ws, name="my-experiment")
src = ScriptRunConfig(source_directory=".", script="train.py")
run = experiment.submit(src)
Pachyderm example:
import python_pachyderm
client = python_pachyderm.Client()
pipeline_spec = {
"pipeline": {"name": "my_pipeline"},
"transform": {
"cmd": ["python", "/train.py"],
"image": "my-image:latest"
},
"input": {"pfs": {"repo": "data", "glob": "/*"}}
}
client.create_pipeline(**pipeline_spec)
Both repositories aim to streamline ML workflows, but Pachyderm focuses more on data versioning and pipeline management, while MLOps provides a broader set of tools for the entire ML lifecycle within the Microsoft ecosystem.
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
Pros of seldon-core
- Focused on model deployment and serving, with advanced features like A/B testing and canary deployments
- Supports multiple ML frameworks and languages out of the box
- Provides a robust API for model serving and monitoring
Cons of seldon-core
- Less comprehensive MLOps coverage compared to MLOps, which offers a more end-to-end solution
- Steeper learning curve due to its Kubernetes-native architecture
- May require additional tools for full MLOps pipeline integration
Code Comparison
seldon-core:
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris-model
spec:
predictors:
- graph:
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/sklearn/iris
name: default
MLOps:
name: Train and Deploy Model
on:
push:
branches: [ main ]
jobs:
train:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Train Model
run: python train.py
The seldon-core example shows a Kubernetes deployment of a pre-trained model, while the MLOps example demonstrates a GitHub Actions workflow for training and deploying a model. This highlights the different focus areas of the two projects.
Build, Manage and Deploy AI/ML Systems
Pros of Metaflow
- More lightweight and flexible, easier to get started with
- Better support for local development and debugging
- Stronger focus on data science workflows and experimentation
Cons of Metaflow
- Less comprehensive enterprise features compared to MLOps
- Smaller community and ecosystem of integrations
- More limited support for model deployment and monitoring
Code Comparison
Metaflow:
from metaflow import FlowSpec, step
class MyFlow(FlowSpec):
@step
def start(self):
self.data = [1, 2, 3]
self.next(self.process)
MLOps:
from azureml.core import Workspace, Experiment, Run
ws = Workspace.from_config()
exp = Experiment(workspace=ws, name="my-experiment")
run = exp.start_logging()
Summary
Metaflow is more focused on data science workflows and local development, while MLOps offers a more comprehensive enterprise-grade MLOps solution. Metaflow is easier to get started with but may lack some advanced features for large-scale deployments. MLOps provides more robust deployment and monitoring capabilities but can be more complex to set up and use.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
page_type: sample languages:
- python products:
- azure
- azure-machine-learning-service
- azure-devops description: "MLOps end to end examples & solutions. A collection of examples showing different end to end scenarios operationalizing ML workflows with Azure Machine Learning, integrated with GitHub and other Azure services such as Data Factory and DevOps."
Updated MLOps Guidance on Azure (2023)
To learn the more about the latest guidance from Microsoft about MLOps review the following links.
- Azure MLOps (v2) Solution Accelerator
- Set up MLOps with Azure DevOps
- Set up MLOps with Github
- Offical Microsoft Documentatation on MLOps
- Microsoft Learn cournse for intro to MLOps
- Microsoft Learn course for E2E MLOps
MLOps on Azure
What is MLOps?
MLOps empowers data scientists and app developers to help bring ML models to production. MLOps enables you to track / version / audit / certify / re-use every asset in your ML lifecycle and provides orchestration services to streamline managing this lifecycle.
MLOps podcast
Check out the recent TwiML podcast on MLOps here
How does Azure ML help with MLOps?
Azure ML contains a number of asset management and orchestration services to help you manage the lifecycle of your model training & deployment workflows.
With Azure ML + Azure DevOps you can effectively and cohesively manage your datasets, experiments, models, and ML-infused applications.

New MLOps features
- Azure DevOps Machine Learning extension
- Azure ML CLI
- Create event driven workflows using Azure Machine Learning and Azure Event Grid for scenarios such as triggering retraining pipelines
- Set up model training & deployment with Azure DevOps
If you are using the Machine Learning DevOps extension, you can access model name and version info using these variables:
- Model Name: Release.Artifacts.{alias}.DefinitionName containing model name
- Model Version: Release.Artifacts.{alias}.BuildNumber where alias is source alias set while adding the release artifact.
Getting Started / MLOps Workflow
An example repo which exercises our recommended flow can be found here
MLOps Best Practices
Train Model
- Data scientists work in topic branches off of master.
- When code is pushed to the Git repo, trigger a CI (continuous integration) pipeline.
- First run: Provision infra-as-code (ML workspace, compute targets, datastores).
- For new code: Every time new code is committed to the repo, run unit tests, data quality checks, train model.
We recommend the following steps in your CI process:
- Train Model - run training code / algo & output a model file which is stored in the run history.
- Evaluate Model - compare the performance of newly trained model with the model in production. If the new model performs better than the production model, the following steps are executed. If not, they will be skipped.
- Register Model - take the best model and register it with the Azure ML Model registry. This allows us to version control it.
Operationalize Model
- You can package and validate your ML model using the Azure ML CLI.
- Once you have registered your ML model, you can use Azure ML + Azure DevOps to deploy it.
- You can define a release definition in Azure Pipelines to help coordinate a release. Using the DevOps extension for Machine Learning, you can include artifacts from Azure ML, Azure Repos, and GitHub as part of your Release Pipeline.
- In your release definition, you can leverage the Azure ML CLI's model deploy command to deploy your Azure ML model to the cloud (ACI or AKS).
- Define your deployment as a gated release. This means that once the model web service deployment in the Staging/QA environment is successful, a notification is sent to approvers to manually review and approve the release. Once the release is approved, the model scoring web service is deployed to Azure Kubernetes Service(AKS) and the deployment is tested.
MLOps Solutions
We are committed to providing a collection of best-in-class solutions for MLOps, both in terms of well documented & fully managed cloud solutions, as well as reusable recipes which can help your organization to bootstrap its MLOps muscle. These examples are community supported and are not guaranteed to be up-to-date as new features enter the product.
All of our examples will be built in the open and we welcome contributions from the community!
- https://github.com/Microsoft/MLOpsPython (reference architecture for MLOps + python)
- https://github.com/Microsoft/Recommenders (recommender systems with E2E mlops baked in)
- https://github.com/MicrosoftDocs/pipelines-azureml (CI/CD with the azure ML CLI)
- https://github.com/Microsoft/MLOps_VideoAnomalyDetection (self-supervised learning with hyperparameter tuning and automated retraining)
- https://github.com/Azure-Samples/MLOpsDatabricks (set up MLOps with Azure ML + databricks)
- https://github.com/roalexan/azureml#schedule-using-adf (schedule an azure ML pipeline from an azure data factory pipeline)
- https://www.azuredevopslabs.com/labs/vstsextend/aml/ (automated template to deploy MLOps on ADO)
- https://github.com/Azure/ACE_Azure_ML/tree/master/devops (set up azure ML + azure DevOps together for predictive maintenance)
- https://github.com/microsoft/nlp ( Natural language processing examples using MLOps + GitHub + Azure)
- https://github.com/microsoft/AIArchitecturesAndPractices
- https://github.com/danielsc/azureml-debug-training/blob/master/Setting%20up%20VSCode%20Remote%20on%20an%20AzureML%20Notebook%20VM.md - code from a notebook VM in VSCode
- https://github.com/jomit/SecureAzureMLWorkshop (code + scripts to run workshop around building secure ml platform on azure)
- https://github.com/Azure/ml-functions-package-demo package an ML model for use in Azure Functions
- https://github.com/microsoft/seismic-deeplearning (deep learning for seismic imaging and interpretation)
- https://github.com/CESARDELATORRE/poc-spark-aml/blob/master/spark-job.py (Spark job on aml compute)
- https://github.com/Azure-Samples/azure-machine-learning-pipeline-observability-sample (Azure Machine Learning Pipeline Run Observability)
How is MLOps different from DevOps?
- Data/model versioning != code versioning - how to version data sets as the schema and origin data change
- Digital audit trail requirements change when dealing with code + (potentially customer) data
- Model reuse is different than software reuse, as models must be tuned based on input data / scenario.
- To reuse a model you may need to fine-tune / transfer learn on it (meaning you need the training pipeline)
- Models tend to decay over time & you need the ability to retrain them on demand to ensure they remain useful in a production context.
What are the key challenges we wish to solve with MLOps?
Model reproducibility & versioning
- Track, snapshot & manage assets used to create the model
- Enable collaboration and sharing of ML pipelines
Model auditability & explainability
- Maintain asset integrity & persist access control logs
- Certify model behavior meets regulatory & adversarial standards
Model packaging & validation
- Support model portability across a variety of platforms
- Certify model performance meets functional and latency requirements
Model deployment & monitoring
- Release models with confidence
- Monitor & know when to retrain by analyzing signals such as data drift
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Related projects
Microsoft AI Labs Github Find other Best Practice projects, and Azure AI design patterns in our central repository.
Top Related Projects
The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.
Machine Learning Toolkit for Kubernetes
🦉 Data Versioning and ML Experiments
Data-Centric Pipelines and Data Versioning
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
Build, Manage and Deploy AI/ML Systems
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot