10 May 26

Stop Measuring AI at the Model Level: The Shift to Workflow Performance Metrics

Stephanie Denino (Head of Advisory, FOUNT)

AI evaluation should focus less on whether the model performs and more on whether the work performs better. Usage and accuracy can look strong while employees face more review, handoffs, or friction. The real ROI signal comes from measuring workflow performance before and after deployment.

•

Our approach

7 min read

There is a framing issue at the center of most AI evaluation efforts, and it is distorting the signal.

Leaders are asking whether the model is performing: accuracy, latency, usage.

The more useful question is whether the work is performing better. For every workflow being transformed by AI, and for every role involved, the real question is: did this make it easier and faster for workers to reach a better outcome?

These are different questions that require different data, and they lead to very different conclusions about whether an AI investment is working.

A CIO magazine article on rescuing failing AI initiatives put it plainly: leaders need to shift from model performance metrics to workflow performance metrics. The technology can be working perfectly and the work can still be worse. Employees may be using the tool, as clicks and logins confirm, but if they are also doing more manual review, navigating more unclear handoffs, or spending more time reconciling AI outputs with reality, adoption is not translating into value.

The organizations making genuine progress on AI ROI have learned to separate these two signals. Model performance tells you whether the technology is functioning. Workflow performance tells you whether it is creating value in the context of real work.

Workflow performance is harder to measure. It requires getting inside the work itself: how effort is distributed, where time goes, and what has improved or gotten worse since the AI was introduced. System data captures some of this, but much of it requires direct input from the workers running the workflows, who know the full picture in a way no dashboard can reconstruct.

The shift also matters for how organizations diagnose problems. When AI underperforms, leaders often look at the tool first: model quality, prompt engineering, integration. Those are worth checking, but more often the diagnosis points to something surrounding the AI, such as a workflow that was never redesigned to accommodate it, a role left unclear, or a data source the AI cannot access.

Those problems are invisible at the model level. They only become visible when you measure the workflow.

For every AI deployment worth measuring, build in a workflow performance baseline before go-live, then remeasure at regular intervals. The delta between those measurements, not the model metrics, is where your ROI signal lives.

•

Our approach

7 min read

There is a framing issue at the center of most AI evaluation efforts, and it is distorting the signal.

Leaders are asking whether the model is performing: accuracy, latency, usage.

These are different questions that require different data, and they lead to very different conclusions about whether an AI investment is working.

Those problems are invisible at the model level. They only become visible when you measure the workflow.

Related Resources

Fresh perspectives about reducing work friction and improving employee experiences.

Foundations

•

April 22, 2026

Who Is Accountable for Whether the Work Actually Improved?

Ask any large organization who owns the CRM, and you will get a name. Ask who owns the staffing policy, the onboarding process, or the new AI assistant, and you will get names. Then ask who is accountable for whether the workflow those things support actually got better last quarter, and you will not get an answer.

The gap is not one of effort or talent. Every function around a workflow is doing its job. IT shipped the tool, HR updated the role, Ops revised the process, and the AI team deployed the agent. Each function has metrics that say it is succeeding, and those metrics are not wrong. They just do not show whether the workflow the worker performs improved after all those changes landed in it.

The worker is the only person who experiences the combined effect. She runs the workflow across all of it: the tool, the policy, the process, the data, the supporting teams, the handoffs, and now the AI. When the pieces do not fit, she reconciles them in the flow of the day, and the combined friction appears on no function’s dashboard.

Design accountability is the missing piece. It means each functional owner can see, and is answerable for, how what they own affects whether the work improves. Not whether the tool shipped or the policy updated, but whether the workflow that runs across them got faster, easier, and better at producing the outcome.

This is a different demand than asking functions to coordinate more. Coordination without a shared measure of the work produces alignment meetings, not alignment. Design accountability requires an instrument: a quantified, recurring picture of how the workflow performs from the perspective of the person running it. With that picture, each owner can see the effect of their piece on the whole. Without it, accountability has nothing to attach to.

AI raises the stakes. Every function is now changing the work faster, with more autonomy and more capital behind it. The organization is already redesigning work. What it has not decided is who is accountable for whether the work gets better. Until someone is answerable for that question, AI investment will keep improving the pieces without improving the work.

Workflow intelligence exists to make the question answerable. Deciding who must answer it is a management choice, and it costs nothing to make.

‍

The Problem

•

June 22, 2026

AI Landed in Your Team’s Workflow. How Would You Know If It’s Working?

If you lead a customer care team, a sales force, a field operation, or a shared services function, the last few years have probably looked something like this:

AI tools got deployed to your team. You were consulted on the use case, maybe. IT handled the rollout, HR handled the change communications, and the AI team tracked adoption. Then everyone waited to see if your numbers moved.

Sometimes they did. Often they did not, and when they did not, the conversation got complicated fast.

This is the operational leader’s AI problem, and it is different from the one the AI team is solving.

The AI team is asking whether the tool is being used and whether the model is performing. Those are legitimate questions, but they are not yours. Your questions are whether your team is reaching better outcomes, whether the work is getting easier and faster, and whether your people can do the job better than they could six months ago or are simply doing it differently, with the same friction and a new interface on top.

Most organizations do not have a clean answer to those questions. The dashboards that exist were built to track tool adoption, not workflow performance. They tell you what your people click, not what gets in their way.

Here is what we see consistently across the operational functions we work with: the AI is often the least of the problem. The friction defeating productivity was there before the AI arrived: unclear decision rights, handoffs that break down between roles, data that lives in three systems and gets reconciled by hand, and escalation paths nobody can quite explain. These are workflow problems rather than technology problems, and they do not show up in an adoption metric.

What operational leaders need, and rarely have, is a clear picture of how work actually unfolds for the people on their teams, the lived version rather than the process map: where time goes, where effort concentrates, and what people work around every day and why.

That picture has two immediate uses. First, it tells you where to push back when AI investments are not delivering: not “the tool doesn’t work,” but “here is the specific friction point in the workflow preventing adoption from translating into outcomes.” That is a conversation you can have with specificity instead of frustration.

Second, it gives you the evidence to make the case for the operational changes your team actually needs, such as process clarity, role definition, and decision rights, which often get deprioritized in favor of the next technology deployment.

You are the leader accountable for outcomes, and your team runs the workflows. They know exactly where things break down. The question is whether anyone is asking them specifically, regularly, and with enough structure to turn their answers into something you can act on.

If the answer is no, that is where to start.

Next Horizon

•

June 18, 2026

AI High Performers Redesign Workflows First. Here’s What That Actually Means.

McKinsey’s State of AI research identified something significant: the top 6% of organizations in AI performance are nearly three times as likely as others to fundamentally redesign their workflows when deploying AI. The difference is structural, not marginal.

The instinct in most organizations is to identify a use case, select a tool, stand up training, and launch. That is process thinking: sequential, organized around the technology. AI high performers do something different. They understand the work before they change it, and they treat workflow redesign as a precondition for successful deployment rather than a follow-on activity.

What does workflow redesign actually mean in practice? It is worth being specific, because the term gets used loosely.

It does not mean updating process flows, changing the technical architecture, or revising job descriptions. All of those may happen as a result, but they are outputs of workflow redesign rather than the thing itself.

Workflow redesign means looking through the worker’s lens at how a specific goal gets accomplished, and deliberately defining what changes in that sequence now that AI is part of the picture. That means deciding where AI and human steps should be restructured to reduce handoffs, where AI creates output requiring judgment the current process does not account for, what can be collapsed, automated, or eliminated, and what new friction the AI creates that must be designed around.

This requires a clear view of the workflow as it currently exists, the real version rather than the process map: where time goes, where effort concentrates, and what workers do that appears in no documentation.

Most organizations do not have that picture when they deploy AI. They are redesigning from an abstraction rather than from reality, and the gap between the intended workflow and the lived one is where AI deployments lose their ROI.

The organizations pulling ahead treat workflow intelligence as infrastructure, something built and maintained rather than commissioned once for a transformation initiative. They go into every AI deployment with a clear, worker-informed view of the workflows they are about to change. They measure, redesign, and remeasure.

That discipline is what separates the 6% from everyone else.

View All

Stop Measuring AI at the Model Level: The Shift to Workflow Performance Metrics

Related Resources

Who Is Accountable for Whether the Work Actually Improved?

AI Landed in Your Team’s Workflow. How Would You Know If It’s Working?

AI High Performers Redesign Workflows First. Here’s What That Actually Means.

Uncover What’s Holding You Back

Sign Up for Newsletter

Sign Up for
Newsletter

Stop Measuring AI at the Model Level: The Shift to Workflow Performance Metrics

Related Resources

Who Is Accountable for Whether the Work Actually Improved?

AI Landed in Your Team’s Workflow. How Would You Know If It’s Working?

AI High Performers Redesign Workflows First. Here’s What That Actually Means.

Uncover What’s Holding You Back

Sign Up for Newsletter

Sign Up forNewsletter

Sign Up for
Newsletter