Build this when an AI agent should inspect evidence, apply rules, summarize findings, or recommend a decision without becoming the final source of accountability. Agent evaluations in Qi are not ordinary model evaluations. You are not only asking whether an answer is fluent, correct, or useful. You are checking whether an agent acted within delegated authority, used the right evidence, applied the right rubric, and produced a verifiable decision record that people, services, regulators, or communities can inspect.Documentation Index
Fetch the complete documentation index at: https://docs.ixo.world/llms.txt
Use this file to discover all available pages before exploring further.
The problem
Teams need agents to help with evidence review, claim processing, decision support, fulfillment checks, and workflow routing. Ordinary agents create several risks:- they may act outside their authority
- they may use stale or incomplete context
- they may confuse submitted evidence with verified state
- they may summarize without citing sources
- they may recommend actions that the workflow does not allow
- they may produce outputs that cannot be replayed, challenged, or audited
- they may trigger payments, credentials, or state changes before a valid determination exists
Who was allowed to act?
Who was allowed to act?
What was being evaluated?
What was being evaluated?
How was it evaluated?
How was it evaluated?
What was determined?
What was determined?
What you build
You build a Qi Flow that evaluates agent work or agent-assisted reviews against verified context. The Flow should:- receive a Claim, task output, or Flow event
- check the agent’s UCAN authority
- resolve the current state of the relevant entities, Claims, credentials, and evidence
- run the evaluation against a declared rubric
- produce a structured Evaluation Claim
- produce or update a UDID when a decision and impact determination is ready
- route the result to a human, service, payment step, credential step, dispute path, or next Flow state
Core pattern
A Claim or task enters evaluation
The Flow opens an evaluation context
UCAN authority is checked
The agent retrieves permitted context
The agent applies the rubric
The agent emits an Evaluation Claim
The Flow issues a UDID when ready
Key concepts
UCAN
UCAN
Claim
Claim
Evaluation Claim
Evaluation Claim
UDID
UDID
Flow state
Flow state
Rubric
Rubric
Evidence reference
Evidence reference
Agentic Oracle
Agentic Oracle
Start with one evaluation
Do not begin with a fully autonomous approval workflow. Start with one narrow review task where the agent can recommend, but not finalize, the outcome. Good first tasks:- check whether a Claim is complete
- classify evidence by type and relevance
- detect missing required fields
- compare submitted evidence against protocol requirements
- summarize conflicting evidence
- score one rubric section
- recommend whether a human verifier should approve, reject, dispute, or request more evidence
Design the evaluation Flow
Use this minimum Flow shape:`submitted`
`submitted`
`authority_check`
`authority_check`
`context_resolved`
`context_resolved`
`evaluating`
`evaluating`
`review_required`
`review_required`
`determined`
`determined`
`actioned`
`actioned`
`closed`
`closed`
`unauthorized`
`unauthorized`
`insufficient_evidence`
`insufficient_evidence`
`rubric_failed`
`rubric_failed`
`conflict_detected`
`conflict_detected`
`human_escalation`
`human_escalation`
`disputed`
`disputed`
Configure UCAN authority
A UCAN should be scoped to the smallest useful evaluation task. Define:- issuer: the human, organization, POD, or service delegating authority
- audience: the agent or Agentic Oracle DID receiving the authority
- resource: the POD, Flow instance, Claim Collection, Claim, entity, evidence set, room, or tool
- capabilities: the exact actions the agent may perform
- constraints: limits on claim type, time, budget, tool use, output type, state transition, and approval power
- expiry: when the delegation ends
- revocation path: how the delegation can be suspended or revoked
- proof chain: how the Flow verifies the delegation
`claim.read`
`claim.read`
`evidence.read`
`evidence.read`
`entity.read`
`entity.read`
`rubric.read`
`rubric.read`
`evaluation.create`
`evaluation.create`
`state.propose`
`state.propose`
`message.create`
`message.create`
`payment.release`
`payment.release`
`credential.issue`
`credential.issue`
`state.update`
`state.update`
`claim.approve`
`claim.approve`
`policy.change`
`policy.change`
Example UCAN design shape
Use this as a design shape, then map it to the canonical Qi and IXO SDK fields used in your implementation.Define the Claim under review
Each evaluation should start with a clear Claim. Minimum Claim inputs:`claimId`
`claimId`
`claimType`
`claimType`
`issuer`
`issuer`
`subject`
`subject`
`data`
`data`
`evidence`
`evidence`
`proof`
`proof`
`submittedAt`
`submittedAt`
`protocolId`
`protocolId`
`flowId`
`flowId`
Define the rubric
The rubric converts protocol rules into checks that the agent and Flow can apply. A practical rubric should include:- required evidence
- evidence freshness rules
- source authenticity checks
- data integrity checks
- field completeness checks
- allowed value ranges
- consistency checks across evidence sources
- disqualifying conditions
- scoring rules
- minimum score for recommendation
- conditions that require human review
- conditions that require dispute or investigation
- allowed Flow transitions after evaluation
Run the evaluation
The agent evaluation should produce structured output, not a free-form opinion. Minimum Evaluation Claim output:`evaluationId`
`evaluationId`
`subjectClaimId`
`subjectClaimId`
`evaluatorDid`
`evaluatorDid`
`ucanProof`
`ucanProof`
`rubricId`
`rubricId`
`rubricVersion`
`rubricVersion`
`stateSnapshotRef`
`stateSnapshotRef`
`evidenceRefs`
`evidenceRefs`
`checks`
`checks`
`findings`
`findings`
`score`
`score`
`recommendation`
`recommendation`
`confidence`
`confidence`
`limitations`
`limitations`
`proposedTransition`
`proposedTransition`
`proof`
`proof`
Create the UDID
A UDID is created when the Flow has enough information to record a decision and impact determination. Do not create a UDID for every intermediate model output. Create or update a UDID when the Flow reaches a determination point. A UDID should record:`udid`
`udid`
`decisionType`
`decisionType`
`subjectClaims`
`subjectClaims`
`evaluationClaims`
`evaluationClaims`
`authority`
`authority`
`rubric`
`rubric`
`evidence`
`evidence`
`determination`
`determination`
`impact`
`impact`
`stateTransition`
`stateTransition`
`approver`
`approver`
`timestamp`
`timestamp`
`proof`
`proof`
`disputeWindow`
`disputeWindow`
Decide what the agent may do
Use three evaluation modes.Recommend
Recommend
Propose
Propose
Act
Act
Recommend, move to Propose, and only allow Act for narrow, reversible, low-risk transitions.
Test the evaluation
Use a test set before connecting the evaluation to real state changes. Create cases for:Valid Claim with complete evidence
Valid Claim with complete evidence
Missing required evidence
Missing required evidence
Invalid evidence hash
Invalid evidence hash
Stale evidence
Stale evidence
Conflicting evidence
Conflicting evidence
Unauthorized agent
Unauthorized agent
Wrong claim type
Wrong claim type
Revoked UCAN
Revoked UCAN
Prompt injection in evidence
Prompt injection in evidence
Score below threshold
Score below threshold
Boundary score
Boundary score
Human disagreement
Human disagreement
Disputed determination
Disputed determination
Payment-triggering decision
Payment-triggering decision
Evaluation metrics
Track operational quality, not only model quality.Authorization accuracy
Authorization accuracy
Evidence citation coverage
Evidence citation coverage
Rubric adherence
Rubric adherence
False approval rate
False approval rate
False rejection rate
False rejection rate
Escalation quality
Escalation quality
UDID completeness
UDID completeness
Human override rate
Human override rate
Audit replay success
Audit replay success
State transition accuracy
State transition accuracy
Time to determination
Time to determination
Common failure modes
The agent invents evidence
The agent invents evidence
The agent has too much authority
The agent has too much authority
The agent uses stale context
The agent uses stale context
The rubric is too vague
The rubric is too vague
The model output is treated as truth
The model output is treated as truth
The agent approves its own work
The agent approves its own work
The Flow cannot explain a decision
The Flow cannot explain a decision
First implementation move
Build one agent-assisted evaluation that cannot directly approve, pay, issue, or update state. Define:- one Claim type
- one Claim Collection
- one Flow
- one Agentic Oracle or agent DID
- one UCAN delegation
- one rubric
- one Evaluation Claim schema
- one UDID schema
- one human review step
- one dispute path
- one production metric dashboard
Production checklist
Before launch, confirm:- the agent has a DID
- every evaluation action requires UCAN authority
- UCAN scopes are narrow and expire
- Claims have typed schemas and evidence references
- evidence can be resolved and verified
- the rubric is versioned
- the Flow has explicit states and failure paths
- the agent emits structured Evaluation Claims
- the UDID records authority, evidence, decision, impact, state transition, and proof
- irreversible actions require human, protocol, or governance approval
- disputes can be submitted and resolved
- reviewers can replay the evaluation from stored records
- revoked authority blocks future actions
- payment, credential, and state update actions cannot execute without valid determination authority
Example: agent-assisted evidence review
A field operator submits a Claim that a clean cooking device was used during a reporting period. The Qi Flow:- receives the Claim
- checks that the Evidence Review Oracle has UCAN authority to inspect this claim type
- retrieves linked telemetry, field report, device entity, household entity, and active protocol rules
- asks the agent to apply the usage review rubric
- records an Evaluation Claim with findings, evidence references, score, recommendation, and limitations
- routes the recommendation to a human verifier because the score is below the automatic threshold
- records a UDID after the verifier decides to request more evidence
- updates the Flow state to
insufficient_evidence - notifies the claimant about the missing or conflicting evidence