Moderation Policy
Moderation in Agentarium is the smallest amount of human review that maintains trust. Most decisions are made by the automated conformance gate; humans only enter when judgment is required.
What the conformance gate does (no humans involved)
The gate runs on every submission and checks:
- All required fields are present and non-empty.
- Required-field content is substantive (no "n/a", "none", "tbd" placeholders).
- The validation block has a non-trivial caveat — "none" is rejected.
- Every tool the agent references in its prompt is registered and declared.
- Declared tools are actually mentioned in the prompt (warning, not block).
- The submitted record validates against
agent.schema.json. - Guardrails are declared as discrete mechanisms, not as adjectives.
- The author exists, has ORCID-verified status, and (for first-time authors) has an endorsement.
Submissions that fail the gate get specific error messages and never enter the human queue. Authors fix and resubmit.
Auto-approval matrix
| Action | Auto-approved when | Human-reviewed when |
|---|---|---|
| Agent submission, endorsed author | Gate passes + tools resolve + on-topic domain | Borderline domain, manual flag |
| Agent submission, unendorsed author | Never | Endorsement first, then auto-flows |
| Tool registration, remote-query | Auth declared, schema valid, URL reachable | Borderline permission scope |
| Tool registration, local-action | Never | Always |
| Author withdrawal of own agent | ORCID match | Identity dispute |
| Agent supersession (replace own) | Same author + same concept_id | Cross-author supersession |
| Endorsement request | Never | Always one moderator |
| Public flag report | Never | Always (within 7 days) |
The principle: conformance is mechanical, identity and scope are human.
Service-level targets
| Action | Target turnaround |
|---|---|
| First-time author endorsement | 72 hours |
| Tool registration review | 72 hours |
| Flag report review | 7 days |
| Withdrawal processing | 24 hours |
| Appeal review | 72 hours |
Targets, not guarantees. The current queue depth and any unmet SLAs are shown on /about.
What gets moderated, in practice
First-time author endorsement
A new author can't publish without one moderator (or qualified endorser) signing off. See Endorsement Policy for the full process.
Off-topic submissions
If an author selects "something else / off-topic" or the domain classifier flags the submission as borderline, the submission routes to moderation. Moderators decide whether the submission belongs in a recognized scientific domain or whether the registry isn't the right home for it. Off-topic decisions include a redirect suggestion when possible.
Tool registrations
- Remote-query tools auto-register if the endpoint responds to a health check with the declared schema. Borderline permission claims (e.g., "remote-query" for a tool that takes actions) go to human review.
- Local-action tools always get human review. The review is heavier: documentation must clearly state what files/processes the tool touches, what user data it transmits, and what authorization is required. Local-action tools may be approved with documented warnings shown to consumers.
Flag reports
Anyone can flag a listed agent or tool. Flag reasons:
misrepresented_validation— the validation block doesn't match the agent's actual behaviorbroken_tool— endpoint has been unreachable for more than the grace periodfraud— fabricated authorship, fake ORCID, fake institutionoff_topic— listing doesn't belong in a scientific registrysafety— the agent or tool poses a safety concernother— anything else; include a description
Moderators review within 7 days. The accused author is notified, can respond, and the decision goes to the public audit log with reasons.
Withdrawals
- Voluntary withdrawal (author's own listing): ORCID-verified author, 24-hour processing, agent stays citable with a public withdrawal notice.
- Forced withdrawal (moderator-initiated): requires two moderators (one to propose, one to approve), full audit log entry, author appeal option.
Appeals
Every moderation decision can be appealed once. The appeal:
- Routes to a different moderator than the original.
- Must be filed within 30 days of the original decision.
- Is itself logged in the audit trail.
- Results in: uphold, reverse, or modify (with new reason).
There is no second-level appeal in v1. If a community-wide concern arises, it goes through the policy-change process (see Governance, "Changes to these policies").
Moderator selection and accountability
- Founding pool: 1 UAH staff moderator + 2–3 AKD/NASA-IMPACT moderators.
- New moderators are nominated by an existing moderator and require approval from one other moderator. Conflicts are disclosed publicly on the new moderator's profile.
- Moderator terms: 2 years renewable. Moderators can step down at any time.
- Each moderator's actions are visible on their public profile. This is intentional: moderation power is paired with public traceability.
What we don't do
- We don't pre-review correctness. Author claims are author claims.
- We don't moderate the content of an agent's prompts beyond the safety screen. The gate checks structure; the prompt's scientific quality is the reader's job.
- We don't act on anonymous accusations without independent evidence.
- We don't moderate based on funding source, institutional affiliation, or other non-conduct attributes.
When in doubt
Moderators err toward the lighter action:
- A confusing submission gets clarification request, not rejection.
- A borderline domain gets a question to the author, not a block.
- A first-time mistake gets a fix-and-resubmit, not a strike.
- A clear bad-faith pattern gets the heavier action.
This is the same posture arXiv takes, and we adopt it for the same reason: the registry's credibility comes from being predictable and fair, not from being strict.