ML Infrastructure: Key Enterprise Technology Trends | Verta.ai

Written by Manasi Vartak | November 08, 2022

Resiliency, Risk, Real Time and other key trends in machine learning

I’ve spent the past several weeks meeting with customers and attending industry events across the insurance and financial services sectors, wrapping up with an event put on by a Fortune 50 bank. It was great to meet with peers in person again and to hear from industry leaders and innovators.

Over the course of my travels, five themes in enterprise technology stood out for me as a machine learning practitioner who heads up an ML infrastructure provider. Here are the key enterprise IT trends that I see impacting ML infrastructure today.

1. Resiliency

Resiliency is a top priority for companies in the insurance and financial industries. Resiliency means that infrastructure, software, and applications can never be down. Customers demand the ability to transact business (get a quote, withdraw cash, submit a claim, etc.) at any time of the day, any day of the year — and downtime is not acceptable.

In the context of ML Infrastructure, we see this need across every enterprise that is using models to make production decisions: A customer needs to be able to search their Slack workspace, get digital content recommendations, obtain a policy quote, and safely conduct transactions 24/7/365 without risk of fraud. This means that production ML Infrastructure, unlike training infrastructure, must have extremely high uptime guarantees, high availability, and disaster recovery baked in.

2. Shift Left

It was great to see many senior executives highlighting the “Shift Left” concept. Shift Left is the practice of identifying and fixing software issues early in the development cycle, and it’s essential for increasing the velocity of app development. Tools that can shift left can significantly improve the app developer experience.

In the context of ML Infrastructure, there is much to be done to enable Shift Left during ML development and deployment. In fact, Verta CTO Conrado Miranda has spoken frequently on this topic, and noted that almost all enterprises deploying production ML models are thinking about integrating processes and tools to test models before they make it to production.

3. The Supply Chain for Data and Models

One data governance leader I spoke to made the crucial observation that data supply chains are subject to increasing regulatory scrutiny. As a result, every data source (alternative or not) needs to be cataloged, documented and governed in keeping with the nature of the data and applicable regulations. (It’s no surprise that the data catalog company Collibra was recently given the partner of the year award by Bank of America!) The need for data governance is not new, and yet for me it was the supply chain aspect that was telling: it’s not only the source data but also the data products resulting from combining these data sources that need a robust supply chain — i.e., lineage.

For ML, the need for lineage and supply chain scrutiny is even more essential. Given the complexity of ML models, their frequent black-box nature, and heavy dependence on training data, knowing where a model came from, what data it was trained on, what features were used, and how it was tested are all part of the model supply chain. It is essential that this information be recorded both for governance purposes and to have confidence in the models themselves. As a result, we are seeing greater awareness and investments from enterprises in their model supply chains.

4. Risk Management

For large financial organizations that handle their customers’ most sensitive information, managing risk is a necessity across all parts of the business. Witness the fines that have been handed out for poor enterprise risk management in the financial sector, hitting as high as $400 million. Organizations must work proactively to reduce risk wherever it might be found, whether it’s related to handling sensitive data, classifying data as PII, using models that may have unexpected consequences, locating data centers, or choosing key technology vendors.

From the lens of ML infrastructure, managing risk has to do with (a) knowing what models you have and where they came from (the supply chain point above), (b) ensuring that the right processes are used to test and deploy the model (shift left from above), and (c) most importantly, monitoring models once they are deployed to ensure that they continue to make meaningful and high-quality decisions. It’s no wonder that we see heightened interest in enterprise model management and model monitoring among companies in sectors like banking and insurance.

5. Real-time analytics

This is a trend that we at Verta are acutely aware of and have seen repeated across the board. In fact, our latest State of MLOps research study highlighted that 54% of use cases currently are real-time, while two-thirds (69%) of study participants say that real-time use cases will increase or increase significantly over the next three years. Moreover, many leaders in banking see the need to move all analytics (BI and modeling) to real-time. Personalization and competitive advantage often drive the need for real-time analytics. However, as noted at the innovation event, most enterprises have a robust data infrastructure that is built for offline use cases, often centered around data warehouses. Moving to real-time analytics requires a completely different tool stack oriented around Operational AI, and enterprises often need support in implementing such a stack.

For ML infrastructure, real-time analytics and model serving are driving the need for real-time serving infrastructure. Current ML systems have focused on training and so have not been built to serve large scale predictions in real-time. This is often the realm of software engineering/IT vs. data science or data infrastructure. As a result, we see high demand building for real-time model serving stacks as distinct from other ML software.

Learn how you can implement Operational AI to become an AI-forward organization with Verta’s Guide to Operational AI whitepaper.

View full post