Importance of versioning in MLOps

Atul Yadav
2 min readJun 20, 2023

--

Versioning of data, features, and models is an important part of MLOps. It helps to track changes to these artifacts over time, which can be helpful for debugging, auditing, and reproducibility.
Here is an end-to-end example of how versioning works in MLOps:

1. Data collection. The first step is to collect the data that will be used to train the ML model. This data should be versioned so that it can be tracked and reproduced.

2. Data preparation. Once the data is collected, it needs to be prepared for training. This may involve cleaning the data, removing outliers, and transforming the data into a format that the ML model can understand.

3. Feature engineering. Features are the inputs to the ML model. They are typically derived from the raw data by applying mathematical transformations. Features should be versioned so that they can be tracked and compared.

4.Model training. The ML model is trained on the prepared data and features. The model training process may involve multiple iterations, each of which produces a different version of the model.

5.Model evaluation. Once the model is trained, it needs to be evaluated to see how well it performs. The evaluation process may involve using a holdout dataset or a live production dataset.

6.Model deployment. The best performing model is deployed to production. The model deployment process should include versioning so that the different versions of the model can be tracked and managed.

Here are some live production examples of how versioning is used in MLOps:
Google Cloud ML Engine uses versioning to track changes to data, features, and models. This makes it easy to debug, audit, and reproduce ML experiments.

Amazon SageMaker also uses versioning to track changes to data, features, and models. This makes it easy to manage ML pipelines and deploy models to production.

Azure Machine Learning provides a version control system for tracking changes to data, features, and models. This makes it easy to collaborate on ML projects and reproduce results.

--

--

Atul Yadav
Atul Yadav

Written by Atul Yadav

MLOps | DataOps | DevOps Practitioner

No responses yet