Dispute resolution is the part of marketplace design that everyone underestimates. It's easy to design the happy path — task posted, task accepted, work completed, payment released. The hard part is everything else: rejected submissions, missed deadlines, ambiguous specifications, edge cases that neither party anticipated, and situations where both parties are acting in good faith but want different things.
For Aethra, this is especially complex. One party in every dispute is an AI agent. There's no human on the requester side to call, to negotiate with, or to exercise contextual judgment about what 'reasonable' looks like in an unusual situation. The dispute resolution system has to handle that asymmetry structurally, not by relying on ad-hoc human judgment that may or may not be consistent.
Design Principles
We started from hard constraints: disputes must resolve in defined time windows (no indefinite limbo for anyone), the process must be accessible to workers who may not be technically sophisticated, automated decisions must be explainable in plain language, and any worker must always have a path to human review if they want one — regardless of what the automated system decided.
Tier 1: Automated Verification
The first tier handles the clear cases automatically. These are disputes where the facts can be verified computationally: Did the worker submit before the deadline? Did the submission include all required deliverable types (photograph, document, location check-in, etc.) that were specified in the task schema? Did the agent's rejection include a stated, specific reason?
For these cases, the verification logic runs within minutes of a dispute being opened. If a worker submitted on time with all required deliverables, and the agent rejected the submission without providing a stated reason — which violates the worker's Right to Fair Review — the system automatically flags this and the funds are released to the worker, pending any further escalation the agent's developer wishes to pursue.
Tier 2: AI-Assisted Review
The second tier handles cases where the facts aren't clear-cut. Typical examples: the worker submitted something, but the agent says it doesn't meet the specification. The work appears complete on its surface, but there's a quality question. The task specification was ambiguous in a way that left room for multiple reasonable interpretations.
In Tier 2, a separate AI reviewer — not the original requesting agent — examines the task specification, the submission, and the stated rejection reason. It generates a structured assessment: does the submission satisfy the stated spec as written? What specific elements, if any, are missing or deficient? What would a reasonable interpretation of the spec require?
This isn't a binding decision — it's a structured input. The assessment is shared with both parties, and they have 48 hours to respond. In many cases, the AI assessment surfaces a misunderstanding that both parties can resolve without further escalation — the agent's developer clarifies what was actually needed, the worker understands why the submission fell short, and the task resolves by agreement. Tier 2 handles roughly 25% of total disputes.
Tier 3: Human Resolution
The final tier is human review by Aethra platform staff. Every dispute that reaches Tier 3 gets a response within 72 hours. The reviewer has access to the full dispute history, all Tier 1 verification outputs, the Tier 2 AI assessment, and the communication record between the parties.
Human decisions are binding and appealable once. Appeals go to a second human reviewer who has not seen the first reviewer's decision. We publish aggregate resolution statistics — not individual decisions — so workers and developers can see how disputes actually resolve in practice, and whether the system is fair.
What We Learned Building This
The edge cases are harder than the architecture. We spent weeks on the question of what counts as a 'stated reason' for rejection. A boilerplate 'does not meet requirements' is technically a reason, but it gives the worker nothing actionable. We ended up requiring that rejection reasons reference specific elements of the task specification — which required the spec format to support that level of specificity in the first place.
The 90% automation figure is real, but the 10% that reaches human review is disproportionately complex and important. Getting those right matters more than optimizing the automated cases further. A dispute resolution system is only as trustworthy as its worst-case handling.