Evaluate agent work in Qi Flows using UCAN authority, Claims, evidence, rubrics, and UDID records.
Build this when an AI agent should inspect evidence, apply rules, summarize findings, or recommend a decision without becoming the final source of accountability.Agent evaluations in Qi are not ordinary model evaluations. You are not only asking whether an answer is fluent, correct, or useful. You are checking whether an agent acted within delegated authority, used the right evidence, applied the right rubric, and produced a verifiable decision record that people, services, regulators, or communities can inspect.
Treat the model response as a working output. Treat Claims, evidence references, UCAN delegations, Flow state, and UDID records as the accountable system of record.
Teams need agents to help with evidence review, claim processing, decision support, fulfillment checks, and workflow routing.Ordinary agents create several risks:
they may act outside their authority
they may use stale or incomplete context
they may confuse submitted evidence with verified state
they may summarize without citing sources
they may recommend actions that the workflow does not allow
they may produce outputs that cannot be replayed, challenged, or audited
they may trigger payments, credentials, or state changes before a valid determination exists
In Qi, the Flow engine solves this by making evaluation part of a governed state machine.An agent evaluation should answer four questions:
Who was allowed to act?
Qi mechanism: UCAN delegation and capability checks
What was being evaluated?
Qi mechanism: Claims, entities, evidence, and Flow state
How was it evaluated?
Qi mechanism: Protocol, rubric, tools, checks, and citations
A participant, service, device, or agent submits a Claim, completes a task, or proposes a state transition inside a Qi Flow.
2
The Flow opens an evaluation context
The Flow instance defines the subject, claim type, protocol, rubric version, allowed evidence sources, current state snapshot, and decision boundary.
3
UCAN authority is checked
The Flow verifies that the agent has the required object-capability delegation for this exact action, resource, claim type, tool, time window, and Flow instance.
4
The agent retrieves permitted context
The agent may only inspect Claims, evidence, rooms, graph state, tools, and records that are inside its UCAN scope.
5
The agent applies the rubric
The agent checks the Claim against required fields, evidence rules, protocol constraints, scoring thresholds, disqualifiers, and escalation rules.
6
The agent emits an Evaluation Claim
The result is written as structured data with cited evidence references, applied checks, confidence, recommendation, limitations, and proof of the agent’s authority.
7
The Flow issues a UDID when ready
A UDID records the decision and impact determination: what was decided, why, under which authority, with which evidence, and what state or value changed.
8
Humans or services act on the result
The Flow routes the evaluation to approval, rejection, dispute, settlement, credential issuance, state update, or a request for more evidence.
Do not begin with a fully autonomous approval workflow.Start with one narrow review task where the agent can recommend, but not finalize, the outcome.Good first tasks:
check whether a Claim is complete
classify evidence by type and relevance
detect missing required fields
compare submitted evidence against protocol requirements
summarize conflicting evidence
score one rubric section
recommend whether a human verifier should approve, reject, dispute, or request more evidence
Avoid first tasks where the agent can directly release funds, issue credentials, update high-value state, or approve irreversible outcomes.
The safest default is propose-only. Let the agent create an Evaluation Claim and propose the next Flow state. Let the Flow, verifier, or protocol decide whether the proposal becomes a UDID-backed determination.
Do not store private model scratchpad as the audit trail. Store evidence references, extracted facts, tool calls, applied checks, rule outcomes, recommendation, limitations, and the final rationale that reviewers can inspect.
A UDID is created when the Flow has enough information to record a decision and impact determination.Do not create a UDID for every intermediate model output. Create or update a UDID when the Flow reaches a determination point.A UDID should record:
`udid`
Unique determination identifier.
`decisionType`
Approval, rejection, request for evidence, dispute, settlement, credential issuance, state update, or no-op.
`subjectClaims`
Claims considered.
`evaluationClaims`
Evaluations used.
`authority`
UCANs, credentials, verifier role, or governance authority.
`rubric`
Rubric and protocol version applied.
`evidence`
Evidence references used in the determination.
`determination`
Final decision.
`impact`
What changed or will change because of the decision.
`stateTransition`
Flow transition or graph update authorized by the determination.
`approver`
Human, service, governance process, or authorized verifier.
`timestamp`
Determination time.
`proof`
Signature, attestation, transaction hash, or other proof.
`disputeWindow`
Period or condition under which the determination can be challenged.
Example UDID shape:
{ "udid": "udid:flow:claim-review:7781:determination:001", "type": "UniversalDecisionAndImpactDetermination", "decisionType": "request_more_evidence", "subjectClaims": [ "claim:stove-usage:000123" ], "evaluationClaims": [ "eval:claim:stove-usage:000123:oracle-01" ], "authority": { "verifier": "did:ixo:person:human-verifier-17", "agentUcan": "ucan:proof:...", "protocol": "blueprint:clean-cooking-mrv:v1" }, "rubric": { "id": "rubric:stove-usage-review:v1", "version": "1.0.0" }, "determination": { "status": "more_evidence_required", "reason": "Telemetry is present and valid, but the field report date conflicts with the reporting period." }, "impact": { "paymentReleased": false, "credentialIssued": false, "claimStatus": "evidence_requested" }, "stateTransition": { "from": "review_required", "to": "insufficient_evidence" }, "disputeWindow": "P14D"}
Require evidence references for every finding. Reject evaluations that cite documents, measurements, Claims, or state that are not present in the permitted context.
The agent has too much authority
Replace broad API keys with UCAN capability delegation. Scope authority by Flow instance, resource, claim type, tool, time window, and allowed action.
The agent uses stale context
Include a state snapshot reference in the Evaluation Claim. If the graph state changes, require a new evaluation or explicit refresh.
The rubric is too vague
Convert policy language into checks, thresholds, disqualifiers, and escalation rules. Ambiguity should route to human review.
The model output is treated as truth
Write structured Evaluation Claims and UDID records. A chat response should not be the source of truth for settlement, credentials, or state changes.
The agent approves its own work
Separate task execution from evaluation. Use independent evaluators or human review for high-stakes decisions.
The Flow cannot explain a decision
Require the UDID to reference Claims, evaluations, evidence, rubric version, authority, decision, impact, and proof.
A field operator submits a Claim that a clean cooking device was used during a reporting period.The Qi Flow:
receives the Claim
checks that the Evidence Review Oracle has UCAN authority to inspect this claim type
retrieves linked telemetry, field report, device entity, household entity, and active protocol rules
asks the agent to apply the usage review rubric
records an Evaluation Claim with findings, evidence references, score, recommendation, and limitations
routes the recommendation to a human verifier because the score is below the automatic threshold
records a UDID after the verifier decides to request more evidence
updates the Flow state to insufficient_evidence
notifies the claimant about the missing or conflicting evidence
The agent helped evaluate the Claim, but it did not become the final authority. The accountable record is the combination of UCAN delegation, Claim, evidence, Evaluation Claim, human review, Flow transition, and UDID.