Choose your first AI agent process with a four-criterion decision matrix: volume, rule clarity, error cost and data readiness. Yüce Zerey's field guide.
A chief data officer I know made an expensive mistake last year. He promised the board an AI agent, and to make a splash he picked the hardest process in the company: a complex workflow handling individual customer complaints. Three months of effort followed. The agent hallucinated, quoted a customer the wrong refund amount, and the board panicked. The project was shut down. The CDO moved to another company within the year.
Your first AI agent will either earn you the right to build a second one or close the door on the whole programme. This guide sets out a four-criterion decision matrix, drawn from Yüce Zerey's advisory casework, for choosing the process where your first AI agent wins: volume, rule clarity, error cost and data readiness.
The same CDO repeated the investment at his next company. This time he chose a different process: IT helpdesk password resets. The agent was live in six weeks. A year later the company was running twelve agents across its operations. The single variable that changed was the process chosen for the first AI agent.
Ask a leadership team "which process should our agent handle?" and you collect biases. The managing director says customer service, because it is the most visible. The CFO says finance operations, because it employs the most people. The CEO says sales, because revenue sits there. All three can be wrong.
Criteria select the right process; intuition selects the loudest one. The Stanford Digital Economy Lab's Enterprise AI Playbook, which analysed 51 successful deployments across 41 companies, found that the first factor separating winners is mapping the workflow before selecting the technology. Companies that choose their pilot with structured criteria go on to scale; companies that choose on instinct stall at the pilot stage. The wider numbers show how common stalling is: MIT research reported by Fortune found 95% of generative AI pilots failing, and Deloitte's AI ROI analysis describes the same paradox of rising investment and elusive returns.
Four criteria settle the question. Let's take them one at a time.
First question: how many times a month does this process repeat?
An agent is a serious build investment, and low-volume processes never pay it back. A workflow that runs five times a week will still be repaying its build cost in five years. A workflow that runs a hundred times a day can pay it back within six months.
The threshold: at least 1,000 repetitions a month. A thousand and above is the sweet spot; ten thousand and above is close to a guaranteed return.
Processes that clear the bar: IT helpdesk tickets, customer queries, invoice processing, lead qualification, contract review in large organisations, stock checks, fraud screening.
Processes that miss it: annual strategy planning, senior executive hiring, quarterly board reporting, one-off project work. These carry high value at low volume, which makes them the right territory for a copilot and the wrong territory for an agent.
Years of marketing automation decisions taught me the same arithmetic. Four campaigns a year get run manually; forty campaigns a month justify automation.
The verdict: a process that repeats 1,000 or more times a month passes criterion 1. Anything below comes off your first AI agent shortlist.
Second question: do the decisions in this process follow explicit rules, or do they need deep judgement?
A rule-clear process sounds like this: "If the invoice exceeds £5,000, escalate for additional approval." "If the complaint falls in category 2, route it to the sales team." Input connects to output visibly.
A judgement process sounds like this: "What is the right solution for this customer?" "Should we hire this candidate?" "What is the long-term strategic value of this contract?" Those calls need context, experience and intuition, and on all three the current generation of agents trails humans badly. They excel where rules are explicit. Pick a rule-clear process for your first AI agent.
The Stanford HAI AI Index 2026 puts the gap on record: agents reached 66% on technical benchmarks, yet 89% of enterprise agent projects never reach production. Technical capability fails to explain that distance. Agents hold their ground in rule-clear processes and stall in processes that demand judgement.
A pattern I keep meeting in supply chain agent pilots across sectors: extracting the replenishment rule is easy ("if stock cover falls below 30 days, raise a purchase order automatically"), while "which supplier should we buy from?" is deep judgement. The fix is to give the agent two separate jobs. Stock-level monitoring runs autonomously, because the rule is clear; supplier selection runs as a human-agent hybrid, because judgement is needed.
The verdict: if 80% or more of the decisions can be made by rule, the process passes criterion 2. A mixed process can still work when the judgement parts are routed to human approval. A pure judgement process makes an agent the wrong technology.
Third question: what happens when the agent decides wrongly?
This is the critical one, because every agent makes mistakes. The real question is whether the cost of a mistake is one you can carry. Three bands:
Keep high-error-cost processes away from your first AI agent. Start where errors are cheap or correctable; once the agent has proved itself, move into higher-risk processes with a human-in-the-loop design.
Klarna's AI assistant episode carries the lesson: a high-volume customer service process where quality fell on complex interactions, until the company publicly conceded it had over-rotated on cost and brought human agents back. Stronger human-in-the-loop controls would have surfaced the decline far earlier. Moffatt v Air Canada reads as the same lesson from another sector: a tribunal held the airline liable for its chatbot's incorrect advice.
The verdict: if the average error costs under £100 or is easily corrected, the process passes criterion 3. Otherwise human-in-the-loop design becomes mandatory, and the pilot grows more complex.
Fourth question: can the agent reach the data it needs, and is the quality good enough?
→ Request a corporate AI briefing
An agent runs on two kinds of data: historical data to learn from and real-time data to decide with. Both must be reachable and clean.
The access problem looks like this: the data sits across four systems, manual entry bridges the gaps, and the systems refuse to talk to each other. In that situation you face six months of integration work before your first AI agent can run.
The quality problem looks like this: 30% of customer records incomplete, 15% contradictory. Feed that to an agent and you get garbage in, garbage out. Cleansing comes before agents, every time.
JPMorgan's agent success rested on years of patient data consolidation investment. Without the data there is no agent.
I hear the same plan constantly: "We're building a customer segmentation agent." Then you look at the data: seven systems, millions of customer records, 22% of them duplicates. That company has eight months of integration and cleansing ahead of it before any agent. In my casework, organisations that respect this order succeed; organisations that skip it burn budget.
The verdict: if the process data can be pulled from a single system at sufficient quality (85%+ complete and accurate), criterion 4 passes. Otherwise the data work comes first.
|
Process |
Criterion 1: Volume (1,000+/month) |
Criterion 2: Rule clarity (80%+) |
Criterion 3: Error cost (low/medium) |
Criterion 4: Data readiness |
Score |
|
Process A |
Y/N |
Y/N |
Y/N |
Y/N |
x/4 |
|
Process B |
Y/N |
Y/N |
Y/N |
Y/N |
x/4 |
|
Process C |
Y/N |
Y/N |
Y/N |
Y/N |
x/4 |
|
Process D |
Y/N |
Y/N |
Y/N |
Y/N |
x/4 |
Reading the score:
Bring this table to your next leadership meeting. List six to eight candidate processes, score each against the four criteria and prioritise the highest score.
An initial review surfaced 12 candidate processes, each scored 0 to 5 per criterion. Customer complaint classification scored 4 or above on everything (volume 5, rules 4, error cost 5, data 4: 18/20). Route optimisation looked brilliant on paper, but its error cost scored 1; a wrong route means a late delivery, and a late delivery means a contractual penalty. The first AI agent therefore started on complaint classification: 71% correct categorisation within 90 days, and average response time down from 7.2 hours to 1.8.
Three mistakes sink a first AI agent pilot more often than any technical fault.
First: choosing the hardest process for visibility. In the cases I have seen, companies that pick a difficult flagship process to impress the board fail far more often than they succeed. Stay away from low-volume, high-error-cost processes. Visibility arrives later anyway, once the agent has proved itself.
Second: planning to fix the data during the pilot. That plan fails. Where a data problem exists, it gets solved first and the agent comes second. The order never changes.
Third: a pilot with no owner. Who answers for a failed pilot? Who scales a successful one? Without an owner, the pilot is orphaned. The working model: a business department (finance, operations or sales) owns it, IT integrates it, the AI lead coordinates and the CEO sponsors. A first AI agent with that ownership structure survives its first crisis.
The first AI agent question is a decision question; the technology comes second. Until the right process, the right owner and the right measurement frame are in place, no agent investment pays back. The matrix compresses Yüce Zerey's advisory casework into a single page: volume, rule clarity, error cost and data readiness, scored honestly across your candidate processes.
Run the exercise this week. List six to eight candidates with your leadership team, score them against the four criteria and commit to a 90-day pilot on the winner. Your second agent, and your twelfth, depend on where your first AI agent lands.
→ Book a corporate AI briefing or keynote with Speaker Agency
Yüce Zerey speaks to boards and executive teams on AI strategy: 100-day AI roadmaps, corporate AI literacy, autonomous AI strategy, EU AI Act readiness and board-level reporting. He works in keynote, workshop, masterclass and advisory formats, and his sessions are built around the same decision matrices he applies in client engagements, including the four-criterion framework in this article.
Yüce Zerey is an AI strategy and transformation advisor with 25+ years of corporate leadership experience across Turkish and European enterprises. As Speaker Agency's AI keynote speaker, he leads literacy programmes, board-level briefings and 100-day transformation roadmaps for UK and EU organisations. His content is built on concrete decision matrices and measurable ROI frameworks.
Several processes score 4/4. Which one comes first?
Use strategic impact as the tie-breaker: which process touches your most critical metric fastest, whether that is margin, customer NPS or employee productivity? The second tie-breaker is a leader who wants the change; a pilot without leadership backing dies, so start in the department whose director is keenest.
What should I do if the pilot runs past four months?
A pilot beyond four months is a warning sign: the scope was too wide, the architecture is weak or the ownership is. Review at the end of month four; if there are no early success signals, stopping the pilot is the kind option. Avoid the sunk-cost trap, pick a new process, carry the lessons over and restart; after three failed pilots, external advice becomes essential.
The pilot succeeded. How do I approach scaling?
Scaling is at least as demanding as the pilot. Three things change: volume rises 10 to 100 times, user diversity grows and error types multiply, so a pilot serving 100 people may need to serve 5,000 at scale. Document the pilot lessons, design the scale architecture (often from scratch), grow the team, start change management and expect scaling to take two to three times the pilot duration.
No process meets all four criteria. Is my organisation simply unready?
There are three possibilities. Your volumes may genuinely be low (typical for firms of 50 to 200 employees), in which case copilots are the better investment for now; your data problems may be heavy, which calls for 6 to 12 months of integration work first; or you may have applied the criteria too strictly. Starting with a 3/4 process is reasonable if the missing criterion is designed into the pilot.
When does bringing in external expertise make sense?
In three situations: for a company running its first ever pilot, after two failed pilots when the root cause needs finding, and at the scaling stage, which is a different discipline from piloting. A half-day workshop scoring six to eight candidate processes against the four-criterion matrix is a strong start for most organisations, followed by 90-day implementation support if needed.