Presented by Manasi Vartak, Co-founder & CEO at Verta
More ML models are being built today than ever before. However, whether you are a researcher writing a paper, an actuary, or an ML engineer building ML products, reproducing models, tracking their lineage, and versioning them still remains a big challenge. This seemingly simple problem of reproducing models has exposed data science teams in regulated industries to hefty fines, caused ML engineers to spend days remedying issues in production deployments, and caused researchers to spend weeks re-creating results from papers.
At MIT, we faced these challenges of research reproducibility first-hand and developed an open-source model versioning and management tool called ModelDB. Unlike tools that only performed model tracking (e.g., metrics, hyperparameters, checkpoints etc.), ModelDB is the first system to versions all ingredients required to create the model, namely, the model code, data, configuration, and environment. Each of these ingredients is snapshotted and stored so that any model can be reproduced from scratch. Since its development, we have used ModelDB to enable reproducible research and model development across many application areas.
In this talk, I will discuss why model versioning is important and only continues to increases in value, present real-world applications where model versioning was able to safeguard against significant fines and save hundreds of researcher-hours, and show how by using a simple, open-source tool like ModelDB, any data scientist using Python can make their models reproducible.