What is a Model Catalog?
March 10, 2022
A model catalog is an inventory of all model assets in an organization that uses metadata to share knowledge assets created via data science, impose governance standards (including bias and fairness), streamline the usage of models across teams, and provide visibility into the performance and usage of models.
Benefits of a model catalog include:
- Sharing of model assets (ML, rules, etc.) created by any team in the organization.
- Governance over models, including implementations of new and updated regulations created after a model has been deployed in production.
- Insights into model performance and usage.
- Streamlined model operationalization via integrations with IT systems and deployment platforms.
How is a model catalog different from a model registry?
A model registry is typically the staging area between model development and productization. The key goal of a model registry is to store information about a model that is necessary for packaging and deploying a model (e.g., pickle files, dependencies, version information.) As a result, a model registry is only adequate for ML engineers or IT professionals who need to run models in production.
A model catalog goes much beyond that by enabling:
- Sharing model assets (ML, rules, etc.) created in any team in the organization. More importantly, it helps re-use of these assets by not just providing the bits-and-bytes for the model but also information about what the model is for, how to use the model – its inputs and outputs, how to calibrate it, how to interpret results, etc.
- Governance over models. Just as new regulations have come into play that control and mandate the proper use of data, a new slew of regulations targeting model usage are on the horizon. A model catalog enables the implementation of these regulations. Governance teams can define rules for the approved usage of models (e.g., lending models cannot use PIIA features) within the catalog and ensure that any model that is used in the organization follows those rules.
- Insight into model performance and usage. A catalog provides information about how the cataloged models are being used (e.g., number of predictions, what teams are using), whether the model met SLAs, and model performance (e.g., the accuracy of the model over time, changes to model performance.)
- Streamlined deployment of models. While a model registry stores models (very much like JFrog Artifactory stores software artifacts), a registry does not specify the workflows used to deploy models into different environments; a catalog does. A model, alongside an orchestration tool like Jenkins, can specify how models can be registered in the catalog (e.g., only on a Git commit), the steps and approvals necessary to deploy the model into development, and then promote them into production.
At Verta, we built an MLOps platform with a model catalog at its center, enabling full lifecycle model management and efficient, scalable model operationalization. If you’d like to learn more, please reach out or schedule a demo!
Manasi Vartak, Ph.D., is the founder and CEO of Verta, an MLOps platform that enables data scientists and ML engineers to manage and operate AI-ML models at scale. Verta created the world’s first Enterprise Model Catalog, enabling organizations to leverage and share model assets while bringing governance and IT oversight to AI product development. Manasi previously developed the open-source ModelDB model management system at MIT and worked on optimizing the news feed algorithms at Twitter and ad-targeting at Google.