Why the unit of work is important
Prompts are stateless. Units of work make outcomes definable, verifiable, priceable, and accountable.
A prompt is an instruction; a unit of work is a job. Everything a business needs from work (definition, verification, pricing, accountability) requires the job, not the instruction.
A prompt is a single instruction with no memory, no files, no budget, and no record. Whatever an agent does in response is unowned and unverifiable: when the reply scrolls away, there is no artifact left to check, price, or hand off. Every serious deficiency of prompt-driven work traces back to this statelessness.
- One instruction
- No state, no files
- No budget, no record
- Nothing to verify
Figure 1: A prompt is an instruction; a unit of work is a job. The job needs a workspace (runtime, memory, files, tools, budget, and record) to live in while it is being done.
The unit of work matters for four reasons.
It makes work definable. "Resolve this ticket" with an explicit end state is a specification; a chat thread is not. A definition of done forces the requester to say what finished and correct looks like before execution starts, which is where most delegation failures are actually created.
It makes work verifiable. Because the unit is scoped to a state change (the ticket closed, the contract reviewed, the change merged), completion is checked by comparing the world before against the world after, the same check a manager makes today. Verification by state comparison is robust in a way that verification by reading the agent's self-report is not.
It makes work priceable. Tokens price the machinery; units of work price the result. A budget attached to the unit bounds what reaching the outcome is worth, and the gap between budget and actual spend becomes a clean, per-outcome efficiency signal instead of a smeared monthly token bill.
It makes work accountable. One unit, one owner. When something ships broken, there is exactly one place to look. This sounds managerial rather than technical, but it is the property that lets businesses extend trust to agents at all.
There is also a subtler reason, borrowed from optimization theory. Any proxy metric placed under pressure gets gamed: Goodhart's law. Activity metrics (messages sent, tokens generated, hours simulated) are weak proxies and agents optimize them effortlessly. A verified state change is the strongest proxy available: it is the outcome itself. Building the economy of agent work on units of work is, in effect, choosing the hardest-to-game measure as the unit of account.
AI spend is rising faster than the ability to account for it
Business spending on AI is growing at a pace with few precedents: Deloitte's enterprise AI survey finds budgets rising across the board while ROI stays elusive, and MIT research found that the vast majority of generative-AI pilots showed no measurable P&L impact despite billions invested. The problem is not that the AI does nothing; it is that there is no fair way to compare what it costs against what it returns. A token bill is not comparable to anything a business already measures: it swings with model choice, verbosity, and retries, it pools unrelated work into one line item, and it prices the machinery rather than any result.
Every other cost a business carries has a unit it is judged in: a contractor is judged per engagement, a team per quarter against its goals, a vendor per delivered scope. The unit of work is the closest possible analogue for AI: a budget set per outcome, actual spend metered against it, and efficiency read as the gap between the two. It does not ask the business to learn a new accounting; it puts AI spend into the accounting the business already runs on. That is what makes AI costs objectively comparable across models, across agents, across vendors, and against the human alternative, for the first time.
Businesses run on intent → action → impact
A business does not think in models, agents, or tokens, and it should not have to. Those are abstractions of the machinery, and no operator judges machinery in the machinery's own units; nobody prices an accountant by spreadsheet keystrokes. The loop a business actually runs is: state an intent, someone takes action, and the business measures the impact. Everything it already manages (tickets, contracts, campaigns, quarters) is shaped like that loop.
The unit of work maps one-to-one onto this loop: intent becomes the definition of done plus the budget, action becomes execution inside the workspace, impact becomes the verified state change. The model, the agent, and the tokens all disappear inside the action step as swappable implementation details, invisible to the operator, exactly as they should be. This is what lets a business put AI directly into its day-to-day operations rather than alongside them: an agent's unit of work sits in the same queue, carries the same kind of budget, and is judged by the same state check as work done by people.
Overview
A research note on why the unit of work matters, how it unlocks collaboration, and why the environment, not the agent, is the key part of the process.
How it enables collaboration
The unit of work is the contract that crosses every handoff: definition of done, budget, identity, state, and record.