Cross-cutting · verified · doc-vs-code correction

Content Moderation Pipeline

What moderation actually runs on user-generated prompts and assets — and why doesn't the published architecture match the code?

Last verified 2026-04-17 against code
Slug moderation-pipeline
Integration model vendored library (⚠ not shared service)
Live callers SoundBuddy · ImageBuddy · (ModerationBuddy admin-only)

§1 Claim vs reality

The existing architecture docs describe a shared moderation service. The running system doesn't use it. This diagram starts by surfacing that gap because every downstream conclusion depends on it.

What docs claim

repos.yaml:63-67: "Active for SoundBuddy; planned expansion to FuzzyCode"

docs/architecture/SERVICE_INTEGRATIONS.md:24-25: "SoundBuddy → ModerationBuddy POST /moderate"

Reading the docs, you'd expect an HTTP fan-out where each content-producing service calls ModerationBuddy before proceeding.

What code does

No service calls ModerationBuddy/moderate at runtime.

Instead, each content service embeds the moderation_core/ library directly and calls OpenAI omni-moderation-latest in-process.

ModerationBuddy's HTTP endpoint exists, but is _require_admin-gated and only invoked by a batch offline tool (FuzzycodePagesFlaskServer/moderation_utility.py:38) for analyzing already-published pages.

§2 The actual runtime moderation topology

flowchart LR classDef user fill:#fff460,stroke:#683c06,color:#111 classDef svc fill:#edd9c0,stroke:#683c06,color:#111 classDef lib fill:#c7d9e8,stroke:#1d58b1,color:#111 classDef ext fill:#eaf3ff,stroke:#1d58b1,color:#111 classDef absent fill:#f0f0f0,stroke:#b8432e,color:#888,stroke-dasharray: 5 5 classDef pre fill:#fff4d6,stroke:#a47a3a,color:#111 U([Browser]):::user subgraph SBFC[" "] FC[FuzzyCode
⚠ NO content moderation
only PII firewall] end subgraph SB[" "] SBsvc[SoundBuddy] SBlib[(moderation_core
embedded library)] SBpre[pre-check
banned words] end subgraph IB[" "] IBsvc[ImageBuddy] IBlib[(moderation_core
embedded library)] IBpre[pre-check
banned words] end OAI[OpenAI
omni-moderation-latest] MB[ModerationBuddy
admin-only HTTP
NOT on user path] OFFLINE[Pages batch job
moderation_utility.py] DB[(SoundBuddy
moderation_history
local DB)] U -->|/send, /sound_effect| SBsvc SBsvc --> SBpre SBpre -->|matched → FLAG| DB SBpre -->|clean| SBlib SBlib -->|HTTPS| OAI OAI -->|outcome| SBlib SBlib --> DB U -->|/image_*| IBsvc IBsvc --> IBpre IBpre -->|matched → FLAG| IBlib IBpre -->|clean| IBlib IBlib -->|HTTPS| OAI U -.->|no moderation| FC OFFLINE -.->|admin-key POST /moderate| MB MB -->|HTTPS| OAI class SBsvc,IBsvc,FC,MB svc class SBlib,IBlib lib class OAI ext class SBpre,IBpre pre class OFFLINE absent class DB lib class U user
FuzzyCode services (production)
Embedded moderation_core/ library + its local DB
External (OpenAI)
Pre-check — banned-word matcher
Admin-only / offline paths — not user-facing

§3 Per-service fail-mode

Same library, different fail-modes. The divergence is silent — neither service calls out that it behaves differently than its sibling. An outage of OpenAI moderation has different user-facing effects depending on which service was called.

Service Integration Adapter init failure Adapter runtime error Pre-check banned word Evidence
SoundBuddy embedded moderation_core FAIL-CLOSED
(reject prompt)
FAIL-OPEN
(allow, log TECHNICAL_FAILURE)
BLOCK main.py:1005-1007, adapters/openai.py:140-152
ImageBuddy embedded moderation_core FAIL-OPEN
(allow)
FAIL-OPEN
(allow, log TECHNICAL_FAILURE)
BLOCK moderation.py:259-285
FuzzyCode none no import of ModerationExecutor
SpriteBuddy none (prompts go via ImageBuddy for content gen) grep: no moderation_core import
ModerationBuddy HTTP admin-only endpoint n/a user-path FAIL-OPEN in adapter BLOCK main.py:689-837
Operational consequence. If OpenAI moderation goes down: SoundBuddy rejects new prompts (fail-closed adapter init on cold start), ImageBuddy accepts everything, FuzzyCode is unaffected (doesn't call moderation). A single event produces different user experiences per service. Any SLO / incident playbook needs to account for this per-service.

§4 Outcome taxonomy (what actually gets persisted)

ModerationOutcome is three independent booleans + free-text reasonnot an enum. Persisted values (moderation_history.moderation_status) are three allowed, but one of them is never written.

Written valueWhenEvidence
FLAGGED · categories = ["PRECHECK_BANNED_WORD"]Pre-check matches local banned-word list; OpenAI never calledexecutor.py:42-58
FLAGGEDOpenAI adapter returns flagged=true after violence-threshold post-processingexecutor.py:73-90, adapters/openai.py:93-94
ALLOWEDOpenAI adapter returns flagged=falseexecutor.py:73-90
ALLOWED · error_details = "TECHNICAL_FAILURE"Adapter raised; fail-open pathadapters/openai.py:140-152
ERROR (dead code)Schema CHECK allows it but no emitter writes it0002_moderation_history_local.sql:43

§5 Gaps — verified, not speculative

No decision state machine. Moderation is a one-shot gate: prompt arrives, outcome emitted, row written. There is no review queue (the needs_review flag exists on the dataclass but is never set true in any code path). No appeal path. No re-evaluation.
No moderation cache. Every prompt is re-evaluated. A repeat prompt pays the OpenAI round-trip each time.
moderation_core/ is vendored into 5 repos via install_update_moderation.sh with no version pinning. Drift between copies is expected over time. Any bug fix needs to be applied in 5 places.
FuzzyCode publishes with no content moderation. The HTML attestation path covers PII leakage (good), but user-supplied text visible on a published page — banner text, image prompts that got baked into HTML — is not screened for policy violations at publish time. Only prompts to the generative services see moderation.

§6 Verification pointers