Operationalization, Model Catalogs, and an EHR for Models at Data+AI Summit
July 08, 2022
Last week the Verta team attended the Data+AI Summit 2022 conference put on by Databricks in the Moscone Center in San Francisco. It was my first in-person conference since 2019 (!), and it was exciting to be back at a live event again, meeting with peers in the AI/ML community. The energy was great in the hallways and in the expo hall, where Intel hosted Verta in their booth. (Thank you to Lakshman Chari with Intel for that!)
Between the talks on the agenda and the conversations in the breaks, here are my three big takeaways from the conference.
Model Operationalization Is Getting Greater Focus
Companies are putting greater emphasis on operationalization, or the process of deploying, managing and monitoring their models. This is a natural progression for the industry, since tools for ETL that data engineers use, and model training and validation tools that data scientists use, are more mature than the tools companies have available to manage the Run side of the stack. The fact that operationalization has been getting increasing attention in the past few years is heartening for us here at Verta, since this is our core focus.
Data and Model Catalogs Are Taking Center Stage
I also was encouraged by the heightened interest in data and model catalogs, and especially the governance and security benefits that catalogs can provide. My take is that the data tool vendors are taking governance more and more seriously, and it’s good to see greater interest in having more discovery and inventory of a company’s data assets. We at Verta believe that data governance and model governance go hand in hand, and we see growing recognition that organizations need a model catalog to ensure discoverability, governance and security for their model assets.
An EHR for Models to Enable Governance, Reproducibility and Auditability
Speaking of model catalogs, this topic came up in my presentation at the summit around full lifecycle model management and the need for an “EHR for models.” (Verta presented on orchestrating ML deployments with Jenkins, and I’ll address that topic in a follow-up post.)
An “Electronic Health Record” (EHR) ideally collects all of a person’s vital health statistics in one place, everything from height and weight at birth, to vaccinations and surgeries, to allergies and advanced medical directives. The goals of an EHR include empowering a person to have better control over their own health through easier access to their health data for both themselves and their doctors and caregivers. This helps reduce health risks over the long term.
In the ML world, the equivalent of an EHR would be a model catalog that provides lineage for a model, stores documentation related to the model, and allows for reproducibility, discoverability and reuse. The goals of an EHR for models include empowering an organization to have better control (i.e., governance) over their models, as well as better risk management and performance management. Verta’s own Model Catalog is intended to offer this kind of single pane of glass for where a model is, how it got there, and how it is performing, thereby improving the “health” of the organization’s model assets.
With models being used in mission-critical applications, and the number of these applications growing rapidly, companies need some way of knowing not only how a model was trained, but also how it was tested, where it is deployed, and – once deployed – is it performing according to SLAs or in accordance with expectations. These are the underpinnings of governance, reproducibility, and auditability, and that’s why we believe that this kind of EHR for models will be a necessity for any organization that relies on ML to drive revenue.
Use the link below to download a copy of Verta’s presentations from the Data+AI Summit 2022.