Moderation Policy

Moderation in Agentarium is the smallest amount of human review that maintains trust. Most decisions are made by the automated conformance gate; humans only enter when judgment is required.

What the conformance gate does (no humans involved)

The gate runs on every submission and checks:

All required fields are present and non-empty.
Required-field content is substantive (no "n/a", "none", "tbd" placeholders).
The validation block has a non-trivial caveat — "none" is rejected.
Every tool the agent references in its prompt is registered and declared.
Declared tools are actually mentioned in the prompt (warning, not block).
The submitted record validates against agent.schema.json.
Guardrails are declared as discrete mechanisms, not as adjectives.
The author exists, has ORCID-verified status, and (for first-time authors) has an endorsement.

Submissions that fail the gate get specific error messages and never enter the human queue. Authors fix and resubmit.

Auto-approval matrix

Action	Auto-approved when	Human-reviewed when
Agent submission, endorsed author	Gate passes + tools resolve + on-topic domain	Borderline domain, manual flag
Agent submission, unendorsed author	Never	Endorsement first, then auto-flows
Tool registration, remote-query	Auth declared, schema valid, URL reachable	Borderline permission scope
Tool registration, local-action	Never	Always
Author withdrawal of own agent	ORCID match	Identity dispute
Agent supersession (replace own)	Same author + same concept_id	Cross-author supersession
Endorsement request	Never	Always one moderator
Public flag report	Never	Always (within 7 days)

The principle: conformance is mechanical, identity and scope are human.

Service-level targets

Action	Target turnaround
First-time author endorsement	72 hours
Tool registration review	72 hours
Flag report review	7 days
Withdrawal processing	24 hours
Appeal review	72 hours

Targets, not guarantees. The current queue depth and any unmet SLAs are shown on /about.

What gets moderated, in practice

First-time author endorsement

A new author can't publish without one moderator (or qualified endorser) signing off. See Endorsement Policy for the full process.

Off-topic submissions

If an author selects "something else / off-topic" or the domain classifier flags the submission as borderline, the submission routes to moderation. Moderators decide whether the submission belongs in a recognized scientific domain or whether the registry isn't the right home for it. Off-topic decisions include a redirect suggestion when possible.

Tool registrations

Remote-query tools auto-register if the endpoint responds to a health check with the declared schema. Borderline permission claims (e.g., "remote-query" for a tool that takes actions) go to human review.
Local-action tools always get human review. The review is heavier: documentation must clearly state what files/processes the tool touches, what user data it transmits, and what authorization is required. Local-action tools may be approved with documented warnings shown to consumers.

Flag reports

Anyone can flag a listed agent or tool. Flag reasons:

misrepresented_validation — the validation block doesn't match the agent's actual behavior
broken_tool — endpoint has been unreachable for more than the grace period
fraud — fabricated authorship, fake ORCID, fake institution
off_topic — listing doesn't belong in a scientific registry
safety — the agent or tool poses a safety concern
other — anything else; include a description

Moderators review within 7 days. The accused author is notified, can respond, and the decision goes to the public audit log with reasons.

Withdrawals

Voluntary withdrawal (author's own listing): ORCID-verified author, 24-hour processing, agent stays citable with a public withdrawal notice.
Forced withdrawal (moderator-initiated): requires two moderators (one to propose, one to approve), full audit log entry, author appeal option.

Appeals

Every moderation decision can be appealed once. The appeal:

Routes to a different moderator than the original.
Must be filed within 30 days of the original decision.
Is itself logged in the audit trail.
Results in: uphold, reverse, or modify (with new reason).

There is no second-level appeal in v1. If a community-wide concern arises, it goes through the policy-change process (see Governance, "Changes to these policies").

Moderator selection and accountability

Founding pool: 1 UAH staff moderator + 2–3 AKD/NASA-IMPACT moderators.
New moderators are nominated by an existing moderator and require approval from one other moderator. Conflicts are disclosed publicly on the new moderator's profile.
Moderator terms: 2 years renewable. Moderators can step down at any time.
Each moderator's actions are visible on their public profile. This is intentional: moderation power is paired with public traceability.

What we don't do

We don't pre-review correctness. Author claims are author claims.
We don't moderate the content of an agent's prompts beyond the safety screen. The gate checks structure; the prompt's scientific quality is the reader's job.
We don't act on anonymous accusations without independent evidence.
We don't moderate based on funding source, institutional affiliation, or other non-conduct attributes.

When in doubt

Moderators err toward the lighter action:

A confusing submission gets clarification request, not rejection.
A borderline domain gets a question to the author, not a block.
A first-time mistake gets a fix-and-resubmit, not a strike.
A clear bad-faith pattern gets the heavier action.

This is the same posture arXiv takes, and we adopt it for the same reason: the registry's credibility comes from being predictable and fair, not from being strict.